antonisdim / haystac

Code repository for the HAYSTAC pipeline
MIT License
12 stars 4 forks source link

Building Haystac Database #8

Closed Turbojewelz closed 3 years ago

Turbojewelz commented 3 years ago

Hey there, so got to run Haystac via installing it with pip in a virtual python environment and also in a conda environment. In both environments I tried to create databases as depicted in the manual but It doesn't seem to work. So maybe you can help me and provide a solution. When I use Conda I get the following error:

Updating job index_all_accessions.
Updating job entrez_db_list.
Updating job randomise_db_order.
[Thu Jun 17 12:31:09 2021]
Error in rule entrez_pick_sequences:
    jobid: 1
    output: /data110/misterx/software2/klebsiella/entrez/entrez-selected-seqs.tsv

Traceback (most recent call last):
  File "/data110/misterx/software2/test/lib/python3.6/site-packages/snakemake/executors/__init__.py", line 575, in _callback
    callback(job)
  File "/data110/misterx/software2/test/lib/python3.6/site-packages/snakemake/scheduler.py", line 544, in _proceed
    job, update_dynamic=update_dynamic
  File "/data110/misterx/software2/test/lib/python3.6/site-packages/snakemake/dag.py", line 1348, in finish
    updated_dag = self.update_checkpoint_dependencies(jobs)
  File "/data110/misterx/software2/test/lib/python3.6/site-packages/snakemake/dag.py", line 1312, in update_checkpoint_dependencies
    self.postprocess()
  File "/data110/misterx/software2/test/lib/python3.6/site-packages/snakemake/dag.py", line 1193, in postprocess
    self.cleanup()
  File "/data110/misterx/software2/test/lib/python3.6/site-packages/snakemake/dag.py", line 258, in cleanup
    del self.depending[dep][job]
KeyError: entrez_taxa_query
Removing output files of failed job entrez_pick_sequences since they might be corrupted:
/data110/misterx/software2/klebsiella/entrez/entrez-selected-seqs.tsv
Trying to restart job 1.

[Thu Jun 17 12:31:09 2021]
Job 1: Selecting the longest sequence per taxon in the entrez query.
Downstream jobs will be updated after completion.

[Thu Jun 17 12:31:11 2021]
Finished job 1.
3 of 5 steps (60%) done
Complete log: /data110/misterx/software2/.snakemake/log/2021-06-17T123038.153429.snakemake.log

This run I did with the following command but I also tried with the Y.pestis one from the manual and it throws the same error.

haystac database --mode build --query '"Klebsiella pneumoniae"[Organism] AND "complete genome"[All Fields] AND refseq[filter]' --output /data110/jsusat_side_projects/software2/klebsiella

And If I try to build a database in the python3 virtual environment where I installed Haystac with pip the error message looks like this:

`/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
haystac: error: Unable to download assembly ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/798/225/GCF_003798225.1_ASM379822v1/GCF_003798225.1_ASM379822v1_genomic.fna.gz                                                                                                                 
None                                                                                                                                                                                                                                                                                      
/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
haystac: error: Unable to download assembly ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/005/937/895/GCF_005937895.2_ASM593789v2/GCF_005937895.2_ASM593789v2_genomic.fna.gz                                                                                                                 
None                                                                                                                                                                                                                                                                                      
/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
/bin/bash: bgzip: command not found                                                                                                                                                                                                                                                       
haystac: error: Unable to download assembly ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/834/455/GCF_000834455.1_ASM83445v1/GCF_000834455.1_ASM83445v1_genomic.fna.gz                                                                                                                   
None                                                                                                                                                                                                                                                                                      
[Thu Jun 17 13:11:45 2021]                                                                                                                                                                                                                                                                
Error in rule entrez_download_sequence:                                                                                                                                                                                                                                                   
    jobid: 22                                                                                                                                                                                                                                                                             
    output: /data110/misterx/software2/virtual_env_haystack_cache/ncbi/Yersinia_rohdei/NZ_CP009787.1.fasta.gz

RuleException:
CalledProcessError in line 66 of /data110/misterx/software2/python3_haystac_virtualenv/lib/python3.7/site-packages/haystac/workflow/rules/entrez.smk:
Command 'set -euo pipefail;  /data110/misterx/software2/python3_haystac_virtualenv/bin/python3 /data110/misterx/software2/.snakemake/scripts/tmpif9vq4lr.entrez_download_sequence.py' returned non-zero exit status 1.
  File "/data110/misterx/software2/python3_haystac_virtualenv/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2352, in run_wrapper
  File "/data110/misterx/software2/python3_haystac_virtualenv/lib/python3.7/site-packages/haystac/workflow/rules/entrez.smk", line 66, in __rule_entrez_download_sequence
  File "/data110/misterx/software2/python3_haystac_virtualenv/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 569, in _callback
  File "/opt/sw/python/3.7.4/lib/python3.7/concurrent/futures/thread.py", line 57, in run
  File "/data110/misterx/software2/python3_haystac_virtualenv/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 555, in cached_or_run
  File "/data110/misterx/software2/python3_haystac_virtualenv/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2364, in run_wrapper
Removing output files of failed job entrez_download_sequence since they might be corrupted:
/data110/misterx/software2/virtual_env_haystack_cache/ncbi/Yersinia_rohdei/NZ_CP009787.1.fasta.gz
Trying to restart job 22.

[Thu Jun 17 13:11:45 2021]
Job 22: Downloading accession NZ_CP009787.1 for taxon Yersinia_rohdei.

[Thu Jun 17 13:11:45 2021]
Error in rule entrez_download_sequence:
    jobid: 20
    output: /data110/misterx/software2/virtual_env_haystack_cache/ncbi/Yersinia_pestis/NZ_CP033699.1.fasta.gz
`

I see the bgzip error which is also pretty odd as I installed bgzip in the virtualenv. But is the other problem where it states that it is unable to download the assembly linked to that? Or is it there for another reason? This error was generated with the following command:

haystac database --mode build --query '"Yersinia"[Organism] AND "complete genome"[All Fields]' --output yersinia_example

Maybe you can help me and provide some guidance :) Thanks in advance, Julian

antonisdim commented 3 years ago

Hello Julian,

I hope you are doing great !

Your first error was caused by the fact that we were not pinning the correct version of snakemake in our conda recipe. Not sure about the following ones though.

Would it possible for you to follow the updated instructions from our documentation and install the latest version of haystac from github (after cloning please install the package with pip), and let me know if you still face the same issues ? Of course I'll keep looking into the error messages you got in the meantime.

Apologies for this inconvenience and thank you for your patience !

Best, Antony

Turbojewelz commented 3 years ago

Hey there and thanks for the answer, unfortunately I still cant get haystac to run. Neither with the updated instructions that I found here (https://github.com/antonisdim/haystac/blob/master/docs/installation.rst) nor with with pip and a python3 virtual environment again. But maybe I am doing something generally wrong. Do you also have an updated document on how to install with pip?

Cheers and all the best, Julian

antonisdim commented 3 years ago

Hello Julian,

I hope you are doing great and I am really sorry for the delayed response !

I have uploaded an executable on my personal conda channel. Could you please install haystac in a fresh conda environment with the following command:

mamba install -c antonisdim haystac

Please make sure you are also using the latest version of conda.

Please let me know how it goes and thank you for your patience !

Best, Antony