Closed Kincekara closed 2 weeks ago
It looks like the tests worked:
#11 [test 1/2] RUN checkv download_database /db
#11 0.285
#11 0.285 CheckV v1.0.3: download_database
#11 0.285 [1/4] Checking latest version of CheckV's database...
#11 1.864 [2/4] Downloading 'checkv-db-v1.5'...
#11 37.35 [3/4] Extracting 'checkv-db-v1.5'...
#11 68.77 [4/4] Building DIAMOND database...
#11 111.8 Run time: 111.56 seconds
#11 111.8 Peak mem: 1.27 GB
#11 111.8 Download completed successfully.
#11 DONE 113.9s
#12 [test 2/2] RUN wget -q https://bitbucket.org/berkeleylab/checkv/raw/51a5293f75da04c5d9a938c9af9e2b879fa47bd8/test/test_sequences.fna && checkv end_to_end -d /db/checkv-db-v1.5 test_sequences.fna test_out -t 4
#12 0.778
#12 0.778 CheckV v1.0.3: contamination
#12 0.778 [1/8] Reading database info...
#12 0.832 [2/8] Reading genome info...
#12 0.836 [3/8] Calling genes with prodigal-gv...
#12 3.859 [4/8] Reading gene info...
#12 3.880 [5/8] Running hmmsearch...
#12 38.94 [6/8] Annotating genes...
#12 38.95 [7/8] Identifying host regions...
#12 38.97 [8/8] Writing results...
#12 38.97 Run time: 38.2 seconds
#12 38.97 Peak mem: 0.16 GB
#12 38.98
#12 38.98 CheckV v1.0.3: completeness
#12 38.98 [1/8] Skipping gene calling...
#12 38.98 [2/8] Initializing queries and database...
#12 39.33 [3/8] Running DIAMOND blastp search...
#12 48.45 [4/8] Computing AAI...
#12 48.71 [5/8] Running AAI based completeness estimation...
#12 48.79 [6/8] Running HMM based completeness estimation...
#12 48.85 [7/8] Determining genome copy number...
#12 48.97 [8/8] Writing results...
#12 48.98 Run time: 10.0 seconds
#12 48.98 Peak mem: 1.61 GB
#12 49.01
#12 49.01 CheckV v1.0.3: complete_genomes
#12 49.01 [1/7] Reading input sequences...
#12 49.02 [2/7] Finding complete proviruses...
#12 49.02 [3/7] Finding direct/inverted terminal repeats...
#12 49.03 [4/7] Filtering terminal repeats...
#12 49.03 [5/7] Checking genome for completeness...
#12 49.03 [6/7] Checking genome for large duplications...
#12 49.03 [7/7] Writing results...
#12 49.03 Run time: 0.02 seconds
#12 49.03 Peak mem: 1.61 GB
#12 49.03
#12 49.03 CheckV v1.0.3: quality_summary
#12 49.03 [1/6] Reading input sequences...
#12 49.03 [2/6] Reading results from contamination module...
#12 49.03 [3/6] Reading results from completeness module...
#12 49.03 [4/6] Reading results from complete genomes module...
#12 49.04 [5/6] Classifying contigs into quality tiers...
#12 49.04 [6/6] Writing results...
#12 49.04 Run time: 0.01 seconds
#12 49.04 Peak mem: 1.61 GB
#12 DONE 49.1s
How big is the database that it uses? Would it be worthwhile to include in the image?
The compressed size of the database is 1.6 GB. The tool can accept the database path with the "-d" flag. So it can be downloaded and used externally
Sounds good. I'll merge and deploy this.
If we want to add a database to an image, we can do what bakta does and have two images (one with a database and one without)
The gitub action for the deployment can be followed here : https://github.com/StaPH-B/docker-builds/actions/runs/11184337714
The image should be up on dockerhub and quay soon.
If we want to add a database to an image, we can do what bakta does and have two images (one with a database and one without)
It may be a good idea. I can make another PR when I have time.
CheckV is a tool for assessing the quality of metagenome-assembled viral genomes. It is actively maintained. It may be useful for metagenomics projects.
paper: https://www.nature.com/articles/s41587-020-00774-7 code: https://bitbucket.org/berkeleylab/checkv
Pull Request (PR) checklist:
docker build --tag samtools:1.15test --target test docker-builds/samtools/1.15
)spades/3.12.0/Dockerfile
)shigatyper/2.0.1/test.sh
)spades/3.12.0/README.md
)