NorwegianVeterinaryInstitute / irida-plugin-readsQC

0 stars 0 forks source link

Test "Confindr" - intra species contamination #3

Closed ajkarloss closed 1 year ago

ajkarloss commented 2 years ago

Confindr is used to check for intra species contamination. https://olc-bioinformatics.github.io/ConFindr/

Logs from running:

2022-10-06 12:46:46 Welcome to ConFindr 0.7.4! Beginning analysis of your samples... 2022-10-06 12:46:46 Did not find rMLST databases, if you want to use ConFindr on genera other than Listeria, Salmonella, and Escherichia, you'll need to download them. Instructions are available at https://olc-bioinformatics.github.io/ConFindr/install/#downloading-confindr-databases

2022-10-06 12:46:46 Beginning analysis of sample 220407_M06578.2022-01-1084-1_S47... 2022-10-06 12:46:46 Checking for cross-species contamination... 2022-10-06 12:47:03 Extracting conserved core genes... 2022-10-06 12:47:10 Quality trimming... 2022-10-06 12:47:11 Detecting contamination... 2022-10-06 12:47:12 Since this is the first time you are using this database, it needs to be indexed by KMA. This might take a while 2022-10-06 12:47:30 Done! Number of contaminating SNVs found: 1

2022-10-06 12:47:30 Contamination detection complete!

Ouput from Confindr Sample,Genus,NumContamSNVs,ContamStatus,PercentContam,PercentContamStandardDeviation,BasesExamined,DatabaseDownloadDate 220407_M06578.2022-01-1084-1_S47,Salmonella,1,False,0,0,61956,ND

I need all of your comments about that to finalize whether we need/include this in QC pipeline.

karinlag commented 2 years ago

If you check your email he listeria doc I think Eve had some info about behavior here. Canny share that here, @ajkarloss?

ajkarloss commented 2 years ago

Eve's test run in SAGA last year

Results confindr test in SAGA: /cluster/projects/nn9305k/active/evezeyl/pipeline/confindr_test:

Seems that confindr integrats both ribosomal and MLST genes that are supposed to be unique. Scheme is downloaded automatically for species: (Escherichia, Salmonella, and Listeria)

Each pair of reads needs to be in its own folder (usage: confindr -i $inputdir -o $outputdir)

It automatically detects the species with mash and then use the scheme for the detected species

Tested with 6 isolates + 3 sets of mixed samples (concatenated reads from pairs of isolates from those 6 isolates): ST is the sequence type (that are used to determine CCs) for Listeria monocytogenes

16SEL1400LM (ST14) and 16SEL1401LM (ST7) -> merge 1

16SEL863LM(ST9) and 16SEL1404LM (ST14) -> merge2

16SEL835LM and 16SEL836LM both (ST7) -> merge3

Raw reads for the test are in: SAGA: /cluster/projects/nn9305k/active/evezeyl/pipeline/test_data

Results test: It detected contamination (report the SNP variants for each loci tested).

Contamination was successfully detected when the mixture was from isolates with 2 different STs: merge1 and merge2

No contamination was detected for merge3, which was a mixture of isolates with same ST – this indicates that they had identical alleles for all the loci tested (no SNPs)

Results consist of several files -> of interest: the list of loci with the variant position in case of mixture, and a report file that mentioned if contamination detected or not.

ajkarloss commented 2 years ago

Confindr is not available for Galaxy and its gonna take at least 3 days to fix it. Instead we can skip this step at the "Read_QC" level and move this step to "Assembly_QC" using CheckM tool. Myself and Thomas looked at it today and its good.

Camilsek commented 2 years ago

@karinlag, do you have any specific arguments to include Confindr? Or could we just go for CheckM? as far as I remember we would not be able to look at intra species contamination in CheckM, but my impression is that most labs don't do that either as it is complicated. Could we leave it for now and raise the discussion again if we need to at a later stage (next year?)?

evezeyl commented 1 year ago

Confindr is a simple MLST tool. The thing that we found when working with MIRKOS / EFSA readings, it that it uses a ribosomal MLST scheme to determine if the copies are unique. MIRKOS had bypassed it, because now the rMLST scheme is under licence, but that is not a problem to obtain one as long as its not used for commercial purpose. An alternative would be to run MLST tool ie with the rMLST scheme, but that would not provide as much detail than confindr that actually looks for snps in the alleles determined (not only determining the type).

georgemarselis-nvi commented 1 year ago

closing. this has not been touched in six months