Open sirrgang opened 9 months ago
Yes, it could be used for this purpose.
If you are not sure which bacterial species might be present, then I would suggest pseudo-aligning your FASTQ files to a database of ribosomal RNA genes for bacteria, since these are likely to be the most abundant transcripts. Put a representative set of rRNA sequences from different bacterial species in a single FASTA-format file for pseudo-alignment (use --aligner kallisto
).
If you already have an idea of the species, you could just pseudo-align reads to the rRNA genes from that specific species.
You can trim adaptors from the reads and then remove any human-aligning reads first, as you suggested.
Then run the pipeline with the --skip_trimming
flag, to avoid re-running the adaptor trimming step.
An example command might look like the below (where fasta_file
contains the bacterial rRNA sequences):
nextflow run BactSeq --data_dir [directory_containing_fastq_files] --sample_file [sample_file] --ref_genome [fasta_file] --aligner kallisto --strandedness unstranded --skip_trimming --paired -profile [docker/conda]
See the example sample_file
in the README for the format of this file. You can use -profile docker
or -profile conda
.
Let me know if you have any other questions.
Thanks, you have a recommended source for ribosomal RNA genes for bacteria?
I guess that could work here https://www.arb-silva.de/fileadmin/silva_databases/current/Exports/SILVA_138.1_SSURef_tax_silva.fasta.gz
I am wondering if i can use the pipeline to detect bacterial infections in human illumina paired ended fastq files. I assume i would use already host removed files and use these against ncbi bacterial genomes?
Thanks