EBI-Metagenomics / emg-viral-pipeline

VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies
Apache License 2.0
119 stars 16 forks source link

filter_contigs_len.py command not fout #64

Closed NailouZhang closed 2 years ago

NailouZhang commented 2 years ago

I run command as fellow: cd ~/20T/DataBase/SoftwaresEnsembel/MAG git clone --recursive https://github.com/EBI-Metagenomics/emg-viral-pipeline.git

export PATH=$PATH:~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline/bin

cd /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27

~/Softwares/Miniconda3/nextflow-21.03.0-edge/nextflow run virify.nf --help

~/Softwares/Miniconda3/nextflow-21.03.0-edge/nextflow run -resume \ ~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline/virify.nf \ --fasta "/home/stone/20T/SraDownload/Genome/TBEV/NC_001672.1_sequence.fasta" \ --cores 4 \ --output /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27 \ --workdir /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work \ --databases ~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline/DATABASES \ --cachedir ~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline/SINGULARITY \ -profile local,singularity

I got: Error executing process > 'preprocess:length_filtering (1)'

Caused by: Process preprocess:length_filtering (1) terminated with an error exit status (127)

Command executed:

filter_contigs_len.py -f NC_001672_renamed.fasta -l 1.5 -o ./ CONTIGS=$(grep ">" NC_001672filt.fasta | wc -l)

Command exit status: 127

Command output: (empty)

Command error: .command.sh: line 2: filter_contigs_len.py: command not found

Work dir: /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work/f6/9db03ebd837a74d632361cd0f07d79

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

How can I resolve this?

hoelzer commented 2 years ago

Hi @NailouZhang ! Thanks for your interest in the pipeline.

It seems that the Python script filter_contigs_len.py which is located in the bin folder of the cloned repository can not be found correctly via your execution of the pipeline (https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/bin/filter_contigs_len.py). The bin folder should be located here:

~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline

Can you please try the following: just install the pipeline code directly via Nextflow:

# pull pipeline code
nextflow pull EBI-Metagenomics/emg-viral-pipeline

# test run w/ latest release
nextflow run EBI-Metagenomics/emg-viral-pipeline -r v0.4.0 --help

# execute your data using the latest release version
nextflow run -resume \
EBI-Metagenomics/emg-viral-pipeline -r v0.4.0 \
--fasta "/home/stone/20T/SraDownload/Genome/TBEV/NC_001672.1_sequence.fasta" \
--cores 4 \
--output /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27 \
--workdir /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work \
--databases ~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline/DATABASES \
--cachedir ~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline/SINGULARITY \
-profile local,singularity
NailouZhang commented 2 years ago

Thanks! but I failed with as fellow:

-----------------------------------------------------------------------------------------------------------------------------------------

~/Softwares/Miniconda3/nextflow-21.03.0-edge/nextflow run -resume \ /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline-0.4.0/virify.nf \ --fasta "/home/stone/20T/SraDownload/Genome/TBEV/NC_001672.1_sequence.fasta" \ --cores 4 \ --output /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27 \ --workdir /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work \ --databases /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline-0.4.0/DATABASES \ --cachedir /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline-0.4.0/SINGULARITY \ -profile local,docker

Error executing process > 'preprocess:rename (1)'

Caused by: Process preprocess:rename (1) terminated with an error exit status (125)

Command executed:

if [[ NC_001672.1_sequence.fasta =~ .gz$ ]]; then zcat NC_001672.1_sequence.fasta > tmp.fasta else cp NC_001672.1_sequence.fasta tmp.fasta fi rename_fasta.py -i tmp.fasta -m NC_001672_map.tsv -o NC_001672_renamed.fasta rename

Command exit status: 125

Command output: (empty)

Command error: Unable to find image 'microbiomeinformatics/emg-viral-pipeline-python3:v1' locally docker: Error response from daemon: Head https://registry-1.docker.io/v2/microbiomeinformatics/emg-viral-pipeline-python3/manifests/v1: dial tcp: lookup registry-1.docker.io on 127.0.1.1:53: read udp 127.0.0.1:46714->127.0.1.1:53: i/o timeout. See 'docker run --help'.

Work dir: /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work/b1/e3cb5d7670f844cb08409030003257

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-----------------------------------------------------------------------------------------------------------------------------------------

but I work well with https://github.com/hoelzer/virify the INSTALL and WORK as fellow:

-----------------------------------------------------------------------------------------------------------------------------------------

安装virify

git clone --recursive https://github.com/hoelzer/virify.git cd virify

docker build -t mhoelzer/prodigal_viral:0.1 -f docker/prodigal/Dockerfile . docker build -t mhoelzer/hmmscan:0.1 -f docker/hmmscan/Dockerfile .

cp bin/ docker/annotation/ cd docker/annotation/ docker build -t mhoelzer/annotation_viral_contigs:0.1 -f Dockerfile . cd .. cp bin/ docker/assign/ cd docker/assign/ docker build -t mhoelzer/assign_taxonomy:0.1 -f Dockerfile . cd ..

copy emg-viral-pipeline/docker/krona to virify/docker/

cp -R ../emg-viral-pipeline/docker/krona docker/ cd docker/krona docker build -t nanozoo/krona:2.7.1--658845d -f Dockerfile . cd ..

cp -R ../emg-viral-pipeline/docker/bioruby docker/ cd docker/bioruby docker build -t nanozoo/bioruby:2.0.1--1f8a188 -f Dockerfile . cd ..

cd ~/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27 ~/Softwares/Miniconda3/nextflow-20.04.1/nextflow run -resume \ ~/20T/DataBase/SoftwaresEnsembel/MAG/virify \ --fasta "/home/stone/20T/SraDownload/Genome/TBEV/NC_001672.1_sequence.fasta" \ --cores 4 \ --output /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27 \ --workdir /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work \ --databases ~/20T/DataBase/SoftwaresEnsembel/MAG/virify/DATABASES \ --cachedir ~/20T/DataBase/SoftwaresEnsembel/MAG/virify/SINGULARITY \ -profile standard

[skipped ] process > download_pprmeta:pprmetaGet [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_model_meta:metaGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_virsorter_db:virsorterGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_viphog_db:viphogGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_rvdb_db:rvdbGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_pvogs_db:pvogsGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_vogdb_db:vogdbGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_vpf_db:vpfGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_ncbi_db:ncbiGetDB [100%] 1 of 1, stored: 1 ✔ [75/59a1d0] process > download_imgvr_db:imgvrGetDB [100%] 1 of 1, failed: 1 ✘ [eb/369861] process > detect:rename (1) [100%] 1 of 1, cached: 1 ✔ [68/35b2d4] process > detect:length_filtering (1) [100%] 1 of 1, cached: 1 ✔ [e3/917400] process > detect:virsorter (1) [100%] 1 of 1, cached: 1 ✔ [e7/acbd25] process > detect:virfinder (1) [100%] 1 of 1, cached: 1 ✔ [81/471160] process > detect:pprmeta (1) [100%] 1 of 1, cached: 1 ✔ [8f/a392d7] process > detect:parse (1) [100%] 1 of 1, cached: 1 ✔ [47/da27cb] process > detect:restore (1) [100%] 1 of 1, cached: 1 ✔ [9c/37108a] process > annotate:prodigal (1) [100%] 1 of 1, cached: 1 ✔ [48/5bb7ed] process > annotate:hmmscan_viphogs (1) [100%] 1 of 1, cached: 1 ✔ [d5/0c94fe] process > annotate:hmm_postprocessing (1) [100%] 1 of 1, cached: 1 ✔ [40/dca0b6] process > annotate:ratio_evalue (1) [100%] 1 of 1, cached: 1 ✔ [89/db331a] process > annotate:annotation (1) [100%] 1 of 1, cached: 1 ✔ [08/54d4d4] process > annotate:plot_contig_map (1) [100%] 1 of 1 ✔ [de/cbe75b] process > annotate:assign (1) [100%] 1 of 1, cached: 1 ✔ [d0/6e23ba] process > plot:generate_krona_table (2) [100%] 2 of 2, cached: 2 ✔ [1e/876ef6] process > plot:krona (2) [100%] 2 of 2, cached: 2 ✔ [39/617798] process > plot:generate_sankey_table (1) [100%] 2 of 2 ✔ [6d/bc9086] process > plot:sankey (1) [100%] 2 of 2 ✔ Error executing process > 'download_imgvr_db:imgvrGetDB'

Error executing process > 'download_imgvr_db:imgvrGetDB'

Caused by: Process download_imgvr_db:imgvrGetDB terminated with an error exit status (4)

Command executed:

wget -nH ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/viral-pipeline/IMG_VR_2018-07-01_4.tar.gz && tar zxvf IMG_VR_2018-07-01_4.tar.gz

Command exit status: 4

Command output:

-----------------------------------------------------------------------------------------------------------------------------------------

hoelzer commented 2 years ago

Hi @NailouZhang !

Okay, so for your first command that failed it seems that you again used the manually cloned repository instead of the installation via Nextflow. Can you please try once more:

# pull pipeline code
~/Softwares/Miniconda3/nextflow-21.03.0-edge/nextflow pull EBI-Metagenomics/emg-viral-pipeline

# test run w/ latest release
~/Softwares/Miniconda3/nextflow-21.03.0-edge/nextflow run EBI-Metagenomics/emg-viral-pipeline -r v0.4.0 --help

# execute your data using the latest release version
~/Softwares/Miniconda3/nextflow-21.03.0-edge/nextflow run -resume \
EBI-Metagenomics/emg-viral-pipeline -r v0.4.0 \
--fasta "/home/stone/20T/SraDownload/Genome/TBEV/NC_001672.1_sequence.fasta" \
--cores 4 \
--output /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27 \
--workdir /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work \
--databases /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline-0.4.0/DATABASES \
--cachedir /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline-0.4.0/SINGULARITY \
-profile local,docker 

Does this work?

If not, the error you got

Command error:
Unable to find image 'microbiomeinformatics/emg-viral-pipeline-python3:v1' locally
docker: Error response from daemon: Head https://registry-1.docker.io/v2/microbiomeinformatics/emg-viral-pipeline-python3/manifests/v1: dial tcp: lookup registry-1.docker.io on 127.0.1.1:53: read udp 127.0.0.1:46714->127.0.1.1:53: i/o timeout.
See 'docker run --help'.

sounds like some issue with Docker. Does this work:

docker pull microbiomeinformatics/emg-viral-pipeline-python3:v1

Second, you then used an discontinued code repository from the early days of the Nextflow version of the pipeline (https://github.com/hoelzer/virify). I don't recommend to use this code because it is not maintained anymore. However, it seems that you build here all the necessary Docker images manually and then executed the pipeline which will work (should also work for the EBI-Metagenomics/emg-viral-pipeline code). But: actually this is not necessary because Nextflow should take care of pulling the Docker images automatically when you use -profile docker,local for example.

So best would be that you get the code from this repository running, using the nextflow pull option and a provided -r release version in combination with the -profile docker,local that should then download the necessary dependencies automatically - if Docker is configured correctly.

NailouZhang commented 2 years ago

Hi @hoelzer ,

Thanks for your suggestion. Now I can run well with emg-viral-pipeline-0.4.0/tests/parse_viral_fixtures/base_fixtures/input.fasta with :

work well

/home/stone/Softwares/Miniconda3/nextflow-21.03.0-edge/nextflow run -resume ~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline-0.4.0/virify.nf \ --fasta "tests/parse_viral_fixtures/base_fixtures/input.fasta" \ --cores 4 --output /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27 \ --workdir /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work \ --databases ~/20T/DataBase/SoftwaresEnsembel/MAG/virify/DATABASES \ --cachedir ~/20T/DataBase/SoftwaresEnsembel/MAG/virify/SINGULARITY \ --virome \ --hmmextend \ --blastextend \ --length 1 \ -profile local,docker

however, When I add --chromomap --balloon , I work errors

I found that the microbiomeinformatics/r_chromomap:v0.1 installed by docker pull cant run well with errors "no chromomap packages", I did some modifications with docker/r_chromomap/Dockerfile

FROM rocker/r-ver:3.5.0

LABEL base_image="rocker/verse:3.5.0" LABEL version="1" LABEL about.summary="r visualization packages" LABEL about.license="SPDX:Apache-2.0" LABEL about.tags="r, visualization" LABEL about.home="https://cran.r-project.org/web/packages/chromoMap/, https://cran.r-project.org/web/packages/ggplot2/, https://cran.r-project.org/web/packages/plotly/" LABEL software="r packages chromoMap, ggplot2, plotly" LABEL software.version="3.15"

LABEL maintainer="MGnify team https://www.ebi.ac.uk/support/metagenomics"

I added

RUN apt update && apt install libcurl4-openssl-dev libssl-dev -y

RUN Rscript -e "install.packages('httr', repos = 'http://cran.us.r-project.org')" && \ Rscript -e "install.packages('curl', repos = 'http://cran.us.r-project.org')" && \ Rscript -e "install.packages('chromoMap')" && \ Rscript -e "install.packages('ggplot2', repos = 'http://cran.us.r-project.org')" && \ Rscript -e "install.packages('plotly', repos = 'http://cran.us.r-project.org')" && \ rm -rf /tmp/downloaded_packages/ /tmp/*.rds

and then run:

docker build -t microbiomeinformatics/r_chromomap:v0.1 -f docker/r_chromomap/Dockerfile .

I can run well with 103 flavivirus

However, The taxonomy in sankey plot and krona can't class these virus fasta

I run as follow:

cd ~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline-0.4.0/ /home/stone/Softwares/Miniconda3/nextflow-21.03.0-edge/nextflow run -resume \ ~/20T/DataBase/SoftwaresEnsembel/MAG/emg-viral-pipeline-0.4.0/virify.nf \ --fasta "/home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/Flavivirus.fasta" \ --cores 4 \ --output /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27 \ --workdir /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work \ --databases ~/20T/DataBase/SoftwaresEnsembel/MAG/virify/DATABASES \ --cachedir ~/20T/DataBase/SoftwaresEnsembel/MAG/virify/SINGULARITY \ --virome \ --hmmextend \ --blastextend \ --length 1 \ -profile local,docker

[skipped ] process > download_pprmeta:pprmetaGet [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_virsorter_db:virsorterGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_virfinder_db:virfinderGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_model_meta:metaGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_viphog_db:viphogGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_rvdb_db:rvdbGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_pvogs_db:pvogsGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_vogdb_db:vogdbGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_vpf_db:vpfGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_ncbi_db:ncbiGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_imgvr_db:imgvrGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_checkv_db:checkvGetDB [100%] 1 of 1, stored: 1 ✔ [d3/0ade86] process > preprocess:rename (1) [100%] 1 of 1 ✔ [ac/b56dc6] process > preprocess:length_filtering (1) [100%] 1 of 1 ✔ [a8/c75806] process > detect:virsorter (1) [100%] 1 of 1 ✔ [92/9ea9f6] process > detect:virfinder (1) [100%] 1 of 1 ✔ [60/1ef0a9] process > detect:pprmeta (1) [100%] 1 of 1 ✔ [f0/82185a] process > detect:parse (1) [100%] 1 of 1 ✔ [f5/8abf7d] process > postprocess:restore (1) [100%] 1 of 1 ✔ [92/98ab09] process > annotate:prodigal (1) [100%] 1 of 1 ✔ [e2/713785] process > annotate:hmmscan_viphogs (1) [100%] 1 of 1 ✔ [6d/1e84b5] process > annotate:hmm_postprocessing (1) [100%] 1 of 1 ✔ [88/b20eca] process > annotate:ratio_evalue (1) [100%] 1 of 1 ✔ [6e/1a15cd] process > annotate:annotation (1) [100%] 1 of 1 ✔ [40/82ec13] process > annotate:plot_contig_map (1) [100%] 1 of 1 ✔ [81/e500aa] process > annotate:assign (1) [100%] 1 of 1 ✔ [48/d831c4] process > annotate:blast (1) [100%] 1 of 1 ✔ [4f/070cf9] process > annotate:blast_filter (1) [100%] 1 of 1 ✔ [74/24cf11] process > annotate:hmmscan_rvdb (1) [100%] 1 of 1 ✔ [d5/3c537e] process > annotate:hmmscan_pvogs (1) [100%] 1 of 1 ✔ [16/373266] process > annotate:hmmscan_vogdb (1) [100%] 1 of 1 ✔ [be/8eb74f] process > annotate:hmmscan_vpf (1) [100%] 1 of 1 ✔ [64/aad264] process > annotate:checkV (1) [100%] 1 of 1 ✔ [68/7f16bd] process > plot:generate_krona_table (1) [100%] 2 of 2 ✔ [70/3ccd1f] process > plot:krona (2) [100%] 2 of 2 ✔ [9f/f9cb59] process > plot:generate_sankey_table (2) [100%] 2 of 2 ✔ [37/f2526d] process > plot:sankey (2) [100%] 2 of 2 ✔ Completed at: 31-十二月-2021 19:31:33 Duration : 13m 37s CPU hours : 1.6 Succeeded : 29

the run results can be checked:

link: https://pan.baidu.com/s/1xiJQ8c4nsRNtg3TVw2TSUg extracte code: vabi

hoelzer commented 2 years ago

Hi @NailouZhang

ok, glad it worked finally. But now you have different issues.

Chromomap

So basically you added

#I added
RUN apt update && apt install libcurl4-openssl-dev libssl-dev -y

to the Dockerfile, then you re-build the image and then it worked fine? If so, thanks for checking and reporting so we could then update the Docker image for Chromomap accordingly @mberacochea .

Taxonomy not assigned for sequence set of flaviviruses

So basically you are certain that the input sequences are flaviviruses but you don't get any taxonomy assignments, right? This requires checking your input FASTAs in more detail. Are the contigs/sequences relatively short? If so, it's hard for VIRify to assign a taxonomy bc/ the method relies on detectable ORFs. And if the sequence is short and only 1-2 informative ORFs can be found, it's difficult to assign a taxonomy with some certainty. In addition, we already saw that some RNA viruses are more difficult to assign in comparison to DNA viruses (shorter genomes, sometimes less well represented in our HMM model set, ...). Nevertheless, VIRify should find clear cases which we would need to investigate more carefully.

NailouZhang commented 2 years ago

Hi @hoelzer ,

Thank you for your reply. Happy New Year.

There were some errors running Chromomap, After adding the parameter -- chromomap, the following information is displayed:

[skipped ] process > download_pprmeta:pprmetaGet [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_virsorter_db:virsorterGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_virfinder_db:virfinderGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_model_meta:metaGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_viphog_db:viphogGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_rvdb_db:rvdbGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_pvogs_db:pvogsGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_vogdb_db:vogdbGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_vpf_db:vpfGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_ncbi_db:ncbiGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_imgvr_db:imgvrGetDB [100%] 1 of 1, stored: 1 ✔ [skipped ] process > download_checkv_db:checkvGetDB [100%] 1 of 1, stored: 1 ✔ [d3/0ade86] process > preprocess:rename (1) [100%] 1 of 1, cached: 1 ✔ [ac/b56dc6] process > preprocess:length_filtering (1) [100%] 1 of 1, cached: 1 ✔ [a8/c75806] process > detect:virsorter (1) [100%] 1 of 1, cached: 1 ✔ [92/9ea9f6] process > detect:virfinder (1) [100%] 1 of 1, cached: 1 ✔ [60/1ef0a9] process > detect:pprmeta (1) [100%] 1 of 1, cached: 1 ✔ [f0/82185a] process > detect:parse (1) [100%] 1 of 1, cached: 1 ✔ [f5/8abf7d] process > postprocess:restore (1) [100%] 1 of 1, cached: 1 ✔ [92/98ab09] process > annotate:prodigal (1) [100%] 1 of 1, cached: 1 ✔ [e2/713785] process > annotate:hmmscan_viphogs (1) [100%] 1 of 1, cached: 1 ✔ [6d/1e84b5] process > annotate:hmm_postprocessing (1) [100%] 1 of 1, cached: 1 ✔ [88/b20eca] process > annotate:ratio_evalue (1) [100%] 1 of 1, cached: 1 ✔ [6e/1a15cd] process > annotate:annotation (1) [100%] 1 of 1, cached: 1 ✔ [40/82ec13] process > annotate:plot_contig_map (1) [100%] 1 of 1, cached: 1 ✔ [81/e500aa] process > annotate:assign (1) [100%] 1 of 1, cached: 1 ✔ [48/d831c4] process > annotate:blast (1) [100%] 1 of 1, cached: 1 ✔ [4f/070cf9] process > annotate:blast_filter (1) [100%] 1 of 1, cached: 1 ✔ [74/24cf11] process > annotate:hmmscan_rvdb (1) [100%] 1 of 1, cached: 1 ✔ [d5/3c537e] process > annotate:hmmscan_pvogs (1) [100%] 1 of 1, cached: 1 ✔ [16/373266] process > annotate:hmmscan_vogdb (1) [100%] 1 of 1, cached: 1 ✔ [be/8eb74f] process > annotate:hmmscan_vpf (1) [100%] 1 of 1, cached: 1 ✔ [64/aad264] process > annotate:checkV (1) [100%] 1 of 1, cached: 1 ✔ [3c/b676ea] process > plot:generate_krona_table (2) [100%] 2 of 2, cached: 2 ✔ [70/3ccd1f] process > plot:krona (1) [100%] 2 of 2, cached: 2 ✔ [9f/f9cb59] process > plot:generate_sankey_table (1) [100%] 2 of 2, cached: 2 ✔ [37/f2526d] process > plot:sankey (2) [100%] 2 of 2, cached: 2 ✔ [49/af036d] process > plot:generate_chromomap_table (1) [100%] 2 of 2 ✔ [5c/20110c] process > plot:chromomap (1) [100%] 1 of 1, failed: 1 [skipping] Stored process > download_ncbi_db:ncbiGetDB [skipping] Stored process > download_virfinder_db:virfinderGetDB [skipping] Stored process > download_vogdb_db:vogdbGetDB [skipping] Stored process > download_model_meta:metaGetDB [skipping] Stored process > download_pvogs_db:pvogsGetDB [skipping] Stored process > download_vpf_db:vpfGetDB [skipping] Stored process > download_virsorter_db:virsorterGetDB [skipping] Stored process > download_viphog_db:viphogGetDB [skipping] Stored process > download_imgvr_db:imgvrGetDB [skipping] Stored process > download_rvdb_db:rvdbGetDB [skipping] Stored process > download_checkv_db:checkvGetDB [skipping] Stored process > download_pprmeta:pprmetaGet Error executing process > 'plot:chromomap (2)'

Caused by: Process plot:chromomap (2) terminated with an error exit status (1)

Command executed:

!/usr/bin/env Rscript

library(chromoMap) library(ggplot2) library(plotly)

contigs <- list() annos <- list() contigs <- dir(pattern = ".contigs.txt") annos <- dir(pattern = ".anno.txt")

for (k in 1:length(contigs)){ c = contigs[k] a = annos[k]

# check if a file is empty
if (file.info(c)$size == 0 || file.info(a)$size == 0) {
  next
}

# check how many categories we have
categories <- c("limegreen", "orange","grey")
df <- read.table(a, sep = "\t")
set <- unique(df$V5)
if ( length(set) == 2 ) {
  if ( set[1] == 'High confidence' && set[2] == 'Low confidence') {
    categories <- c("limegreen", "orange")
  }
  if ( set[1] == 'High confidence' && set[2] == 'No hit') {
    categories <- c("limegreen", "grey")
  }
  if ( set[1] == 'Low confidence' && set[2] == 'No hit') {
    categories <- c("orange", "grey")
  }
}
if ( length(set) == 1 ) {
  if ( set[1] == 'High confidence') {
    categories <- c("limegreen")
  }
  if ( set[1] == 'Low confidence') {
    categories <- c("orange")
  }
  if ( set[1] == 'No hit') {
    categories <- c("grey")
  }
}

p <-  chromoMap(c, a,
  data_based_color_map = T,
  data_type = "categorical",
  data_colors = list(categories),
  legend = T, lg_y = 400, lg_x = 100, segment_annotation = T,
  left_margin = 100, canvas_width = 1000, chr_length = 8, ch_gap = 6)
htmlwidgets::saveWidget(as_widget(p), paste("Flavivirus.chromomap-", k, ".html", sep=''))

}

Command exit status: 1

Command output: (empty)

Command error:

Attaching package: ‘plotly’

The following object is masked from ‘package:ggplot2’:

  last_plot

The following object is masked from ‘package:stats’:

  filter

The following object is masked from ‘package:graphics’:

  layout

Error in chromoMap(c, a, data_based_color_map = T, data_type = "categorical", : unused arguments (data_based_color_map = T, data_type = "categorical", data_colors = list(categories), legend = T, lg_y = 400, lg_x = 100, segment_annotation = T, left_margin = 100, canvas_width = 1000, chr_length = 8, ch_gap = 6) Execution halted

Work dir: /home/stone/20T/DataBase/SoftwaresEnsembel/MAG/test_2021_12_27/work/70/44b6c3ddeddc8aaa8f5dc533752571

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

Second, I determined that all the sequences belonged to flaviviruses

I download these sequences through the following links: https://www.ncbi.nlm.nih.gov/nuccore?term=%28%22Flavivirus%22%5BOrganism%5D%20OR%20Flavivirus%5BAll%20Fields%5D%29%20AND%20%28viruses%5Bfilter%5D%20AND%20refseq%5Bfilter%5D%29&cmd=DetailsSearch ("Flavivirus"[Organism] OR ("Flavivirus"[Organism] OR Flavivirus[All Fields])) AND (viruses[filter] AND refseq[filter])

There is only one ORF (code polyproteins) in flaviviruses. So, it may be hard to categorize.

Thrid, I test 67 coronavirus, which have many orfs. It can work well with 66 sequencens.

("Alphacoronavirus"[Organism] OR "Betacoronavirus"[Organism] OR "Gammacoronavirus"[Organism] OR coronavirus[All Fields]) AND (viruses[filter] AND biomol_genomic[PROP] AND refseq[filter])

hoelzer commented 2 years ago

Hi @NailouZhang , also a happy and healthy '22 to you!

Thanks for the detailed reporting.

1) So chromomap still fails? Or do you were able to solve this by modifying the Docker container?

2) Flavivirus detection

There is only one ORF (code polyproteins) in flaviviruses. So, it may be hard to categorize.

Yes, this is unfortunately a current limitation of VIRify. It might be possible to tune some parameters to get these recognized, but while also increasing the false-positive detection rate. I think this might be doable via this parameter:

https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/bin/contig_taxonomic_assign.py#L103

help="Minimum number of proteins with ViPhOG annotations at each taxonomic level, required for taxonomic assignment (default: 2)",

For your use-case, it might be reasonable to allow the user to easily adjust this parameter. I will open a separate issue about this topic because currently the pipeline does not provide a parameter to easily change this.

3) Coronavirus test

Great, nice to hear. We also had good experiences w/ contigs derived from Coronavirus reads.

NailouZhang commented 2 years ago

Hi @hoelzer ,

I don't know what happened, chromomap not working ):.

mberacochea commented 2 years ago

Hi @NailouZhang,

Thank you for the extensive error report. I'll try to look at this next week, I'm busy at the moment.

@hoelzer I'll update the docker container for Chromomap.

Cheers

mberacochea commented 2 years ago

It tool long... but I finally added those lines to the container and pushed a new version of it.

TBH, I wasn't able to test it. I will