adamd3 / BactSeq

A nextflow pipeline for performing bacterial RNA-Seq data analysis.
MIT License
7 stars 1 forks source link

Can it be used for identifying bacteria in human paired fastq files? #1

Open sirrgang opened 9 months ago

sirrgang commented 9 months ago

I am wondering if i can use the pipeline to detect bacterial infections in human illumina paired ended fastq files. I assume i would use already host removed files and use these against ncbi bacterial genomes?

Thanks

adamd3 commented 9 months ago

Yes, it could be used for this purpose.

If you are not sure which bacterial species might be present, then I would suggest pseudo-aligning your FASTQ files to a database of ribosomal RNA genes for bacteria, since these are likely to be the most abundant transcripts. Put a representative set of rRNA sequences from different bacterial species in a single FASTA-format file for pseudo-alignment (use --aligner kallisto).

If you already have an idea of the species, you could just pseudo-align reads to the rRNA genes from that specific species.

You can trim adaptors from the reads and then remove any human-aligning reads first, as you suggested. Then run the pipeline with the --skip_trimming flag, to avoid re-running the adaptor trimming step.

An example command might look like the below (where fasta_file contains the bacterial rRNA sequences):

nextflow run BactSeq --data_dir [directory_containing_fastq_files] --sample_file [sample_file] --ref_genome [fasta_file] --aligner kallisto --strandedness unstranded --skip_trimming --paired -profile [docker/conda]

See the example sample_file in the README for the format of this file. You can use -profile docker or -profile conda.

Let me know if you have any other questions.

sirrgang commented 8 months ago

Thanks, you have a recommended source for ribosomal RNA genes for bacteria?

sirrgang commented 8 months ago

I guess that could work here https://www.arb-silva.de/fileadmin/silva_databases/current/Exports/SILVA_138.1_SSURef_tax_silva.fasta.gz