biologger / speciesprimer

The SpeciesPrimer pipeline is intended to help researchers finding specific primer pairs for the detection and quantification of bacterial species in complex ecosystems.
GNU General Public License v3.0
40 stars 19 forks source link

No GFF file #31

Open vupadhyay-code opened 2 months ago

vupadhyay-code commented 2 months ago

Hello Biologger - your prior suggestion for Issue 15 was great and solved the problem I was running into. However, now when I run the version of the pipeline you suggested, I get the following error:

Run: quality_control(rRNA) Starting QC with rRNA found 0 gff files Error: No .gff files found for QualityControl rRNA Error report: for target Anaerostipes_hadrus Error 1: Error: No .gff files found for QualityControl rRNA

The folder called gff_files and ffn_files are empty. It looks like prokka is not running to me. These are the lines run before hand (genomes are downloaded)

GCF_000210695v1 annotation required Run prokka --kingdom Bacteria --outdir GCF_000210695v1_20240903 --genus Anaerostipes --locustag GCF_000210695v1 --prefix GCF_000210695v1_20240903 --cpus 0 genomic_fna/GCF_000210695.1_ASM21069v1_genomic.fna

Thanks in advance!

biologger commented 2 months ago

Could you post the command you used to run the pipeline so I can try to reproduce the issue?

vupadhyay-code commented 2 months ago

I ran: speciesprimer.py

Then just went through prompts in the shell script.

I heard there is some issue with prokka support from my lab mate who is struggling with new install of prokka.

biologger commented 2 months ago

Can you share the content of the ./Anaerostipes_hadrus/config/config.json file? "." is probably the directory where you started speciesprimer.py.

If you are using the docker container prokka installation is not a issue in this case.

vupadhyay-code commented 2 months ago

Here you go:

{"exception": ["Eubacterium hadrum", "Anaerostipes sp. 5/1/63FAA", "Clostridiales bacterium SSC/2", "butyrate-producing bacterium SS2/1", "butyrate-producing bacterium SSC/2"], "mfold": -3.0, "intermediate": false, "path": "/turnbaugh/qb3share/shared_resources/apptainer_containers/new_speciesprimer", "ignore_qc": false, "assemblylevel": ["all"], "skip_tree": false, "nolist": false, "offline": false, "target": "Anaerostipes_hadrus", "maxsize": 200, "mfethreshold": 90, "blastseqs": 1000, "customdb": null, "skip_download": false, "probe": false, "minsize": 70, "mpprimer": -3.5, "blastdbv5": false, "qc_gene": ["rRNA"]}

Thanks for being so responsive.

biologger commented 2 months ago

Thanks for sharing! I will look into this issue and answer as soon as I have news...

biologger commented 1 month ago

Hi,

I was not able to reproduce this Error, I was able to annotate all the Anaerostipes_hadrus genomes and QC worked with the latest Docker container.

Does restarting the run always lead to the same result? Did you check if you have enough disk space left for the annotated files?

vupadhyay-code commented 1 month ago

Yeah I got the same error again. I've tried it twice. It downloads the genomes just fine and fails at this gff level. I have a lot of disk space. You want me to try to clear some out and run it again? Are there any other requirements that I might need to change at the system level?

biologger commented 1 month ago

Hm, strange... Could you try to run the follwing command inside the /primerdesign/Anaerostipes_hadrus directory and post the output from the terminal?

prokka --kingdom Bacteria --outdir GCF_000210695v1_20240903 --genus Anaerostipes --locustag GCF_000210695v1 --prefix GCF_000210695v1_20240903 --cpus 0 genomic_fna/GCF_000210695.1_ASM21069v1_genomic.fna

vupadhyay-code commented 1 month ago

[09:04:53] This is prokka 1.14.5 [09:04:53] Written by Torsten Seemann torsten.seemann@gmail.com [09:04:53] Homepage is https://github.com/tseemann/prokka [09:04:53] Local time is Mon Sep 9 09:04:53 2024 [09:04:53] You are vupadhyay [09:04:53] Operating system is linux [09:04:53] You have BioPerl 1.006924 [09:04:53] System has 48 cores. [09:04:53] Will use maximum of 48 cores. [09:04:53] Annotating as >>> Bacteria <<< [09:04:53] Creating new output folder: GCF_000210695v1_20240903 [09:04:53] Running: mkdir -p GCF_000210695v1_20240903 [09:04:53] Using filename prefix: GCF_000210695v1_20240903.XXX [09:04:53] Setting HMMER_NCPU=1 [09:04:53] Writing log to: GCF_000210695v1_20240903/GCF_000210695v1_20240903.log [09:04:53] Command: /programs/prokka/bin/prokka --kingdom Bacteria --outdir GCF_000210695v1_20240903 --genus Anaerostipes --locustag GCF_000210695v1 --prefix GCF_000210695v1_20240903 --cpus 0 genomic_fna/GCF_000210695.1_ASM21069v1_genomic.fna [09:04:53] Appending to PATH: /programs/prokka/bin/../binaries/linux [09:04:53] Appending to PATH: /programs/prokka/bin/../binaries/linux/../common [09:04:53] Appending to PATH: /programs/prokka/bin [09:04:53] Looking for 'aragorn' - found /usr/bin/aragorn [09:04:53] Determined aragorn version is 001002 from 'ARAGORN v1.2.36 Dean Laslett' [09:04:53] Looking for 'barrnap' - found /usr/bin/barrnap [09:04:53] Determined barrnap version is 000007 from 'barrnap 0.7' [09:04:53] Looking for 'blastp' - found /programs/ncbi-blast/bin/blastp [09:04:54] Determined blastp version is 002009 from 'blastp: 2.9.0+' [09:04:54] Looking for 'cmpress' - found /usr/bin/cmpress [09:04:54] Determined cmpress version is 001001 from '# INFERNAL 1.1.1 (July 2014)' [09:04:54] Looking for 'cmscan' - found /usr/bin/cmscan [09:04:54] Determined cmscan version is 001001 from '# INFERNAL 1.1.1 (July 2014)' [09:04:54] Looking for 'egrep' - found /bin/egrep [09:04:54] Looking for 'find' - found /usr/bin/find [09:04:54] Looking for 'grep' - found /bin/grep [09:04:54] Looking for 'hmmpress' - found /usr/bin/hmmpress [09:04:54] Determined hmmpress version is 003001 from '# HMMER 3.1b2 (February 2015); http://hmmer.org/' [09:04:54] Looking for 'hmmscan' - found /usr/bin/hmmscan [09:04:54] Determined hmmscan version is 003001 from '# HMMER 3.1b2 (February 2015); http://hmmer.org/' [09:04:54] Looking for 'java' - found /usr/bin/java [09:04:54] Looking for 'makeblastdb' - found /programs/ncbi-blast/bin/makeblastdb [09:04:54] Determined makeblastdb version is 002009 from 'makeblastdb: 2.9.0+' [09:04:54] Looking for 'minced' - found /programs/prokka/bin/../binaries/linux/../common/minced [09:04:54] Determined minced version is 002000 from 'minced 0.2.0' [09:04:54] Looking for 'parallel' - found /usr/bin/parallel [09:04:55] Determined parallel version is 20161222 from 'GNU parallel 20161222' [09:04:55] Looking for 'prodigal' - found /usr/bin/prodigal [09:04:55] Determined prodigal version is 002006 from 'Prodigal V2.6.2: February, 2015' [09:04:55] Looking for 'prokka-genbank_to_fasta_db' - found /programs/prokka/bin/prokka-genbank_to_fasta_db [09:04:55] Looking for 'sed' - found /bin/sed [09:04:55] Looking for 'tbl2asn' - found /programs/prokka/bin/../binaries/linux/tbl2asn [tbl2asn] This copy of tbl2asn is more than a year old. Please download the current version. [09:04:55] Determined tbl2asn version is 025007 from 'tbl2asn 25.7 arguments:' [09:04:55] Using genetic code table 11. [09:04:55] Loading and checking input file: genomic_fna/GCF_000210695.1_ASM21069v1_genomic.fna [09:04:55] Wrote 1 contigs totalling 3114788 bp. [09:04:55] Predicting tRNAs and tmRNAs [09:04:55] Running: aragorn -l -gc11 -w GCF_000210695v1_20240903\/GCF_000210695v1_20240903.fna [09:04:56] 1 tRNA-Ser c[61020,61108] 35 (gct) [09:04:56] 2 tRNA-Glu [118882,118953] 34 (ttc) [09:04:56] 3 tRNA-Cys [118986,119056] 33 (gca) [09:04:56] 4 tRNA-Met [119099,119173] 35 (cat) [09:04:56] 5 tRNA-Glu [218867,218938] 34 (ttc) [09:04:56] 6 tRNA-Thr [218973,219045] 34 (tgt) [09:04:56] 7 tRNA-Met [219052,219125] 35 (cat) [09:04:56] 8 tRNA-Asp [219150,219224] 35 (gtc) [09:04:56] 9 tRNA-Val [219250,219323] 34 (tac) [09:04:56] 10 tRNA-Leu [219353,219440] 36 (taa) [09:04:56] 11 tRNA-Arg [219487,219561] 35 (acg) [09:04:56] 12 tRNA-Tyr [559549,559631] 35 (gta) [09:04:56] 13 tRNA-Leu [559635,559720] 35 (taa) [09:04:56] 14 tRNA-Ser [685506,685593] 37 (tga) [09:04:56] 15 tRNA-Ser [685621,685711] 36 (gga) [09:04:56] 16 tRNA-Arg c[1373568,1373642] 35 (ccg) [09:04:56] 17 tRNA-Met [1474024,1474097] 35 (cat) [09:04:56] 18 tRNA-Gln [1696504,1696574] 33 (ctg) [09:04:56] 19 tRNA-Lys [1696580,1696653] 34 (ttt) [09:04:56] 20 tRNA-Gln [1696743,1696813] 33 (ctg) [09:04:56] 21 tRNA-Lys [1696818,1696890] 34 (ttt) [09:04:56] 22 tRNA-Thr c[1703992,1704064] 34 (ggt) [09:04:56] 23 tRNA-Glu [1713841,1713914] 35 (ctc) [09:04:56] 24 tRNA-Arg [1736603,1736678] 36 (tct) [09:04:56] 25 tRNA-His [1736720,1736794] 35 (gtg) [09:04:56] 26 tRNA-Gln [1736818,1736889] 33 (ttg) [09:04:56] 27 tRNA-Lys [1736908,1736980] 34 (ttt) [09:04:56] 28 tRNA-Leu [1737319,1737401] 36 (tag) [09:04:56] 29 tRNA-Arg [1737956,1738031] 36 (tct) [09:04:56] 30 tRNA-His [1738079,1738153] 35 (gtg) [09:04:56] 31 tRNA-Lys [1738193,1738266] 34 (ttt) [09:04:56] 32 tRNA-Asp [1813557,1813631] 35 (gtc) [09:04:56] 33 tRNA-Val [1813650,1813723] 34 (tac) [09:04:56] 34 tRNA-Thr [1813743,1813815] 34 (tgt) [09:04:56] 35 tRNA-Tyr [1813865,1813947] 35 (gta) [09:04:56] 36 tRNA-Met [1813958,1814033] 36 (cat) [09:04:56] 37 tRNA-Phe [1814055,1814130] 35 (gaa) [09:04:56] 38 tRNA-Gly [1838421,1838491] 33 (tcc) [09:04:56] 39 tRNA-Gly [1958568,1958640] 34 (gcc) [09:04:56] 40 tRNA-Thr [2067260,2067332] 34 (cgt) [09:04:56] 41 tRNA-Lys c[2247340,2247413] 34 (ctt) [09:04:56] 42 tRNA-Phe [2307923,2307997] 35 (gaa) [09:04:56] 43 tRNA-Gly [2308003,2308074] 33 (tcc) [09:04:56] 44 tRNA-Met [2599944,2600019] 36 (cat) [09:04:56] 45 tRNA-Val [2674822,2674895] 34 (tac) [09:04:56] 46 tRNA-Met [2674932,2675006] 35 (cat) [09:04:56] 47 tRNA-Ser c[2787732,2787815] 35 (cag) [09:04:56] 48 tRNA-Leu c[2787820,2787905] 35 (aag) [09:04:56] 49 tRNA-Arg c[2863143,2863217] 35 (cct) [09:04:56] 50 tRNA-Trp [2930556,2930627] 34 (cca) [09:04:56] 51 tmRNA [2957248,2957587] 86,115 ADNKLAYAA* [09:04:56] 52 tRNA-Ile [2979859,2979934] 34 (tat) [09:04:56] 53 tRNA-Leu c[3075729,3075812] 35 (caa) [09:04:56] Found 53 tRNAs [09:04:56] Predicting Ribosomal RNAs [09:04:56] Running Barrnap with 48 threads [09:04:56] 1 NC_021016.1 44193 16S ribosomal RNA [09:04:56] Found 1 rRNAs [09:04:56] Skipping ncRNA search, enable with --rfam if desired. [09:04:56] Total of 53 tRNA + rRNA features [09:04:56] Searching for CRISPR repeats [09:04:57] Found 0 CRISPRs [09:04:57] Predicting coding sequences [09:04:57] Contigs total 3114788 bp, so using single mode [09:04:57] Running: prodigal -i GCF_000210695v1_20240903\/GCF_000210695v1_20240903.fna -c -m -g 11 -p single -f sco -q [09:05:03] Found 3014 CDS [09:05:03] Connecting features back to sequences [09:05:03] Not using genus-specific database. Try --usegenus to enable it. [09:05:03] Annotating CDS, please be patient. [09:05:03] Will use 48 CPUs for similarity searching. [09:05:05] There are still 3014 unannotated CDS left (started with 3014) [09:05:05] Will use blast to search against /programs/prokka/db/kingdom/Bacteria/IS with 48 CPUs [09:05:05] Running: cat GCF_000210695v1_20240903\/GCF_000210695v1_20240903.IS.tmp.3673189.faa | parallel --gnu --plain -j 48 --block 9615 --recstart '>' --pipe blastp -query - -db /programs/prokka/db/kingdom/Bacteria/IS -evalue 1e-30 -qcov_hsp_perc 90 -num_threads 1 -num_descriptions 1 -num_alignments 1 -seg no > GCF_000210695v1_20240903\/GCF_000210695v1_20240903.IS.tmp.3673189.blast 2> /dev/null [09:05:06] Could not run command: cat GCF_000210695v1_20240903\/GCF_000210695v1_20240903.IS.tmp.3673189.faa | parallel --gnu --plain -j 48 --block 9615 --recstart '>' --pipe blastp -query - -db /programs/prokka/db/kingdom/Bacteria/IS -evalue 1e-30 -qcov_hsp_perc 90 -num_threads 1 -num_descriptions 1 -num_alignments 1 -seg no > GCF_000210695v1_20240903\/GCF_000210695v1_20240903.IS.tmp.3673189.blast 2> /dev/null