bioinformatics-centre / kaiju

Fast taxonomic classification of metagenomic sequencing reads using a protein reference database
http://kaiju.binf.ku.dk
GNU General Public License v3.0
260 stars 68 forks source link

Getting Repeated Species in cell free DNA profiles #287

Open arpit20328 opened 1 week ago

arpit20328 commented 1 week ago

Hi authors,

I have paired end fastq files of cell free DNA patients infected with sepsis. I have 6 patients files

For these 6 patients the distribution is as follows

Patient 1: Leuconostoc sp. DORA_2 Escherichia coli Chlamydia trachomatis Streptococcus pneumoniae Vibrio vulnificus Staphylococcus aureus Enterococcus faecalis Mycobacterium leprae cyanobacterium G8-9 Bacillus paranthracis Acinetobacter baumannii Klebsiella pneumoniae Streptococcus oralis Lactobacillus crispatus Mycobacterium tuberculosis Levilactobacillus brevis Staphylococcus hominis Staphylococcus epidermidis Cutibacterium acnes Plasmodium ovale Listeria monocytogenes

Patient 2:

Leuconostoc sp. DORA_2 Chlamydia trachomatis Escherichia coli Streptococcus pneumoniae Vibrio vulnificus Cutibacterium acnes Staphylococcus aureus Enterococcus faecalis Mycobacterium leprae Saccharomycodes ludwigii Acinetobacter baumannii Lactobacillus crispatus Klebsiella pneumoniae Streptococcus oralis cyanobacterium G8-9 Pseudomonas aeruginosa Bacillus paranthracis Staphylococcus hominis Plasmodium ovale

Patient 3:

Leuconostoc sp. DORA_2 Mycoplasmopsis arginini Paracoccus acridae Escherichia coli Chlamydia abortus Wenyingzhuangia marina Klebsiella pneumoniae Streptomyces malachitofuscus Streptococcus pneumoniae Nocardioides sp. OK12 Enterococcus faecium Chlamydia trachomatis Staphylococcus aureus Staphylococcus epidermidis Enterococcus faecalis Cutibacterium acnes Lactobacillus crispatus Streptomyces gancidicus Levilactobacillus brevis Rhodococcus fascians Mycobacterium leprae Acinetobacter baumannii Enterobacter hormaechei Streptococcus oralis Bacillus yapensis Staphylococcus hominis Streptosporangium violaceochromogenes

Similarly for patients 4,5,6 we are getting at top Leuconostoc sp. DORA_2 and in second spot Escherichia coli

My Question is why the spectrum is not getting changed patient wise. ? It is getting changed when bowtie2 or karken2 based classification is used.

pmenzel commented 1 week ago

These might all be false positive hits, either contamination from the DNA extraction or library prep kits or from kaiju's database search itself. Cell-free DNA sequencing from blood requires strict measurements of negative controls to filter out background noise!

arpit20328 commented 1 week ago

@pmenzel yes. so these results are after wet lab negative template control data removal.

Now question comes of false hits computationally. which is tough to decipher.

False fastq files with fabricated reads might be the best way. but I have figure it out on how to do it.