HRGV / phyloFlash

phyloFlash - A pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.
GNU General Public License v3.0
75 stars 25 forks source link

No SSU rRNA sequences found in trusted contigs by Barrnap #192

Open AbigailJTH opened 1 week ago

AbigailJTH commented 1 week ago

Hi, I was trying to assemble some endolithic green algae chloroplast 16S rRNA from a coral metatranscriptome. I have the database for multiple strains and I prepared the database following the instructions on https://hrgv.github.io/phyloFlash/install.html 4.3. Set up a custom database with your own sequences.

However, the following errors happened. [09:20:27] Extracting SSU rRNA from trusted contigs /data3/Meta_Os_rRNA/SILVA_OsDB.fasta... [09:20:27] running subcommand: /root/miniconda3/envs/phyloflash/lib/phyloFlash/barrnap-HGV/bin/barrnap_HGV --evalue 1e-100 --reject 0.6 --kingdom bac --gene ssu --threads 20 /data3/Meta_Os_rRNA/SILVA_OsDB.fasta

G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.bac.gff 2>G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.barrnap.log [09:20:27] running subcommand: /root/miniconda3/envs/phyloflash/lib/phyloFlash/barrnap-HGV/bin/barrnap_HGV --evalue 1e-100 --reject 0.6 --kingdom arch --gene ssu --threads 20 /data3/Meta_Os_rRNA/SILVA_OsDB.fasta G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.arch.gff 2>G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.barrnap.log [09:20:28] running subcommand: /root/miniconda3/envs/phyloflash/lib/phyloFlash/barrnap-HGV/bin/barrnap_HGV --evalue 1e-100 --reject 0.6 --kingdom euk --gene ssu --threads 20 /data3/Meta_Os_rRNA/SILVA_OsDB.fasta G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.euk.gff 2>G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.barrnap.log [09:20:29] no SSU rRNA sequences found in trusted contigs by Barrnap [09:20:29] mapping extracted SSU reads back on trusted SSU sequences [09:20:29] running subcommand: /root/miniconda3/envs/phyloflash/bin/bbmap.sh fast=t minidentity=0.98 -Xmx20g threads=20 po=f outputunmapped=t ref=G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.all.fasta nodisk in=G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.G5-176-C2-T0-OsA-LFK11691_L2_paired.rrna.1.fq.SSU.1.fq out=G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.bbmap.sam noheader=t overwrite=t in2=G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.G5-176-C2-T0-OsA-LFK11691_L2_paired.rrna.1.fq.SSU.2.fq pairlen=1200 outu=G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.bbmap.outu.fwd.fastq outu2=G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.bbmap.outu.rev.fastq 2>G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.bbmap.out [09:20:29] FATAL: Tool execution failed!. Error was '' and return code '256' Check error log file G5-176-C2-T0-OsA-LFK11691_L2_paired_almost_everything.trusted.bbmap.out Aborting. [09:20:29] Saving log to file phyloFlash_log_on_error Processing complete for folder: G5-176-C2-T0-OsA-LFK11691_L2_paired

My database sequences are quite short, like ~250bp. Is that the reason for the failure? Could you please give me some instructions? Thanks a lot!

kbseah commented 1 week ago

hello, thanks for your report. It looks like you tried to use the -trusted option, but that doesn't work when working with a custom database because the trusted contigs are screened with the default SSU rRNA models.

Could you please supply the full command line you used?

AbigailJTH commented 1 week ago

Hi, thanks so much for your reply!

I was using a .sh script cuz I have a lot of samples.

!/bin/bash

output_dir = "/data3/Meta_Os_rRNA/output_phyloflash" for file in *rrna.1.fq.gz; do echo "Processing sample: ${file}"

    # Get the folder name
    sample_name=$(basename "${file}" .rrna.1.fq.gz)
    # Set input file names
    echo  "Sample is : ${sample_name}"
    input_r1="${sample_name}.rrna.1.fq.gz"
    input_r2="${sample_name}.rrna.2.fq.gz"
    log="${sample_name}.log"
    # Run minimap2 to map reads to concatenated and indexed assembly for the current sample
 phyloFlash.pl -lib "${sample_name}_almost_everything" -read1 "${input_r1}" -read2 "${input_r2}" -almosteverything -CPUs 20 -readlength 145 -dbhome /data/software/SILVA_db/138.1/ -log  -zip -taxlevel 10 -readlimit 1000000 -trusted /data3/Meta_Os_rRNA/SILVA_OsDB_v2.fasta

echo "Processing complete for folder: ${sample_name}" done

I am not sure if this matters but my 16S rRNA sequences are quite short.

kbseah commented 1 week ago

Thanks for the details. Could you try running phyloFlash without the -trusted option?

The idea behind -trusted was to allow users to supply the full-length SSU rRNA sequences for organisms that were known to be in the libraries, so these can be mapped out before the remainder are assembled. This improves the assembly of lower-abundance SSU rRNA sequences in some cases. If you don't have full length sequences then it would not be useful here.

Hope that this helps

AbigailJTH commented 1 week ago

Thanks for your suggestions. I tried it without the -trusted option and it worked.

My aim was to assemble some SSU rRNA sequences which are not in the SILVA database but unfortunately we don't have the full-length ones.

Thanks a lot!

kbseah commented 1 week ago

You could consider trying the PR2 database, which does include plastid rRNA sequences, and adapt it for phyloFlash: https://pr2-database.org/

Good luck with your project!