bailey-lab / SeekDeep

Bioinformatic Tools for analyzing targeted amplicon sequencing developed by Nicholas Hathaway of Bailey Lab
http://seekdeep.brown.edu/
GNU Lesser General Public License v3.0
13 stars 5 forks source link

No output with setupTarAmpAnalysis with customized parameters #14

Open dcm9123 opened 2 years ago

dcm9123 commented 2 years ago

Hi Nick!

I hope you are doing great! I've been working with malaria amplicon deep sequencing using cpmp, but I've encountered a few issues whenever I customize parameters using the setupTarAmpAnalysis. Briefly, my workflow is like this:

Sample collection -> DNA extraction -> PCR on cpmp (430 bp) -> Illumina MiSeq 2x250 PE mode (~40,000 raw reads output) -> Amplicon deep sequencing.

In this case I am looking at a returning traveler who took Malarone as chemoprophylaxis and returned diagnosed with malaria. This person was in the hospital for 3 days treated with Malarone, and was discharged after 3 days. Then she came back with fever. We performed NGS on cytb and we noticed how there was a de novo mutation after she was discharged and that wasn't there. To determine if this was a minor clone or a genotype that mutated in cytb, we did msp2 (3D7/FC27) electrophoresis in agarose, msp2 (3D7/FC27) capillary electrophoresis, and now amplicon deep sequencing on cpmp. The results from both electrophoretic methods suggest this was not a minor clone and it was a mutation; now I'm trying to figure out if this is the same case with amplicon deep sequencing using your pipeline.

Some info about the files: overlapStatus -> R1EndsinR2 (as there's a bit of overlap between the forward and reverse reads). primers were taken from the HaplotypR paper from Lerch (with Illumina adapters included in the nPCR). 13 samples were included in this analysis (positive and negative control).

I noticed that when I ran setupTarAmp in the default parameters, the pipeline works beautifully but it only detects one type of haplotype in my clinical samples and a different one in my positive control, but just one in each case:

Screen Shot 2022-05-03 at 3 03 38 PM

Therefore I was trying to be more relaxed in my parameters to see if that picks up more diversity, and when I was playing with it I noticed the pipeline would have empty data described in the nohup.out whenever I tried visualizing using a local host port. I tried different parameters and they all come out empty in the results (i.e. --otu 1; --otu 0.97; --qualThres 20,15; --snpFreqCutOff 0.001). I also noticed that all the nucleotides displayed a quality base of 37, so I was trying to relax this number to a phred score of ~30. One of the commands I am running that comes out empty handed is the one shown below: seekdeep setupTarAmpAnalysis --samples sampleNames.tab.txt --outDir analysis --inputDir fastq/ --idFile idFile.tab.txt --overlapStatusFnp overlapStatuses.txt --lenCutOffs lenCutOffs.tab.txt --numThreads 8 --refSeqsDir genomes/ --extraQlusterCmds="otu 0.97"

But whenever I run it with the default parameters it comes out fine:

seekdeep setupTarAmpAnalysis --samples sampleNames.tab.txt --outDir analysis --inputDir fastq/ --idFile idFile.tab.txt --overlapStatusFnp overlapStatuses.txt --lenCutOffs lenCutOffs.tab.txt --numThreads 8 --refSeqsDir genomes/ ./runanalysis 8

./startServerCmd.sh

To make sure that my data is not 'corrupted' or simply not variable enough, I did a preliminary (quick and dirty) screening with freebayes looking for SNPs with a minimum of 1% and phred score of 30, where I found 6 SNPs in one of my samples with different depths that for some reason SeekDeep cannot assign to a haplotype.

Screen Shot 2022-05-03 at 3 45 46 PM

I am attaching two .fastq files along with the other files I am using for SeekDeep. Have you ever encountered something similar?

Thanks again for your valuable help and advise, Nick.

Best,

Daniel

sampleNames.tab.txt overlapStatuses.txt lenCutOffs.tab.txt idFile.tab.txt

dcm9123 commented 2 years ago

I couldn't upload the .fastq.gz files, but please let me know if you need them and I can provide them. Thanks again for taking some time to look at this