I hope you are doing great! I've been working with malaria amplicon deep sequencing using cpmp, but I've encountered a few issues whenever I customize parameters using the setupTarAmpAnalysis. Briefly, my workflow is like this:
Sample collection -> DNA extraction -> PCR on cpmp (430 bp) -> Illumina MiSeq 2x250 PE mode (~40,000 raw reads output) -> Amplicon deep sequencing.
In this case I am looking at a returning traveler who took Malarone as chemoprophylaxis and returned diagnosed with malaria. This person was in the hospital for 3 days treated with Malarone, and was discharged after 3 days. Then she came back with fever. We performed NGS on cytb and we noticed how there was a de novo mutation after she was discharged and that wasn't there. To determine if this was a minor clone or a genotype that mutated in cytb, we did msp2 (3D7/FC27) electrophoresis in agarose, msp2 (3D7/FC27) capillary electrophoresis, and now amplicon deep sequencing on cpmp. The results from both electrophoretic methods suggest this was not a minor clone and it was a mutation; now I'm trying to figure out if this is the same case with amplicon deep sequencing using your pipeline.
Some info about the files:
overlapStatus -> R1EndsinR2 (as there's a bit of overlap between the forward and reverse reads).
primers were taken from the HaplotypR paper from Lerch (with Illumina adapters included in the nPCR).
13 samples were included in this analysis (positive and negative control).
I noticed that when I ran setupTarAmp in the default parameters, the pipeline works beautifully but it only detects one type of haplotype in my clinical samples and a different one in my positive control, but just one in each case:
Therefore I was trying to be more relaxed in my parameters to see if that picks up more diversity, and when I was playing with it I noticed the pipeline would have empty data described in the nohup.out whenever I tried visualizing using a local host port. I tried different parameters and they all come out empty in the results (i.e. --otu 1; --otu 0.97; --qualThres 20,15; --snpFreqCutOff 0.001). I also noticed that all the nucleotides displayed a quality base of 37, so I was trying to relax this number to a phred score of ~30. One of the commands I am running that comes out empty handed is the one shown below:
seekdeep setupTarAmpAnalysis --samples sampleNames.tab.txt --outDir analysis --inputDir fastq/ --idFile idFile.tab.txt --overlapStatusFnp overlapStatuses.txt --lenCutOffs lenCutOffs.tab.txt --numThreads 8 --refSeqsDir genomes/ --extraQlusterCmds="otu 0.97"
But whenever I run it with the default parameters it comes out fine:
To make sure that my data is not 'corrupted' or simply not variable enough, I did a preliminary (quick and dirty) screening with freebayes looking for SNPs with a minimum of 1% and phred score of 30, where I found 6 SNPs in one of my samples with different depths that for some reason SeekDeep cannot assign to a haplotype.
I am attaching two .fastq files along with the other files I am using for SeekDeep. Have you ever encountered something similar?
Thanks again for your valuable help and advise, Nick.
I couldn't upload the .fastq.gz files, but please let me know if you need them and I can provide them. Thanks again for taking some time to look at this
Hi Nick!
I hope you are doing great! I've been working with malaria amplicon deep sequencing using cpmp, but I've encountered a few issues whenever I customize parameters using the setupTarAmpAnalysis. Briefly, my workflow is like this:
Sample collection -> DNA extraction -> PCR on cpmp (430 bp) -> Illumina MiSeq 2x250 PE mode (~40,000 raw reads output) -> Amplicon deep sequencing.
In this case I am looking at a returning traveler who took Malarone as chemoprophylaxis and returned diagnosed with malaria. This person was in the hospital for 3 days treated with Malarone, and was discharged after 3 days. Then she came back with fever. We performed NGS on cytb and we noticed how there was a de novo mutation after she was discharged and that wasn't there. To determine if this was a minor clone or a genotype that mutated in cytb, we did msp2 (3D7/FC27) electrophoresis in agarose, msp2 (3D7/FC27) capillary electrophoresis, and now amplicon deep sequencing on cpmp. The results from both electrophoretic methods suggest this was not a minor clone and it was a mutation; now I'm trying to figure out if this is the same case with amplicon deep sequencing using your pipeline.
Some info about the files: overlapStatus -> R1EndsinR2 (as there's a bit of overlap between the forward and reverse reads). primers were taken from the HaplotypR paper from Lerch (with Illumina adapters included in the nPCR). 13 samples were included in this analysis (positive and negative control).
I noticed that when I ran setupTarAmp in the default parameters, the pipeline works beautifully but it only detects one type of haplotype in my clinical samples and a different one in my positive control, but just one in each case:
Therefore I was trying to be more relaxed in my parameters to see if that picks up more diversity, and when I was playing with it I noticed the pipeline would have empty data described in the nohup.out whenever I tried visualizing using a local host port. I tried different parameters and they all come out empty in the results (i.e. --otu 1; --otu 0.97; --qualThres 20,15; --snpFreqCutOff 0.001). I also noticed that all the nucleotides displayed a quality base of 37, so I was trying to relax this number to a phred score of ~30. One of the commands I am running that comes out empty handed is the one shown below:
seekdeep setupTarAmpAnalysis --samples sampleNames.tab.txt --outDir analysis --inputDir fastq/ --idFile idFile.tab.txt --overlapStatusFnp overlapStatuses.txt --lenCutOffs lenCutOffs.tab.txt --numThreads 8 --refSeqsDir genomes/ --extraQlusterCmds="otu 0.97"
But whenever I run it with the default parameters it comes out fine:
seekdeep setupTarAmpAnalysis --samples sampleNames.tab.txt --outDir analysis --inputDir fastq/ --idFile idFile.tab.txt --overlapStatusFnp overlapStatuses.txt --lenCutOffs lenCutOffs.tab.txt --numThreads 8 --refSeqsDir genomes/
./runanalysis 8
./startServerCmd.sh
To make sure that my data is not 'corrupted' or simply not variable enough, I did a preliminary (quick and dirty) screening with freebayes looking for SNPs with a minimum of 1% and phred score of 30, where I found 6 SNPs in one of my samples with different depths that for some reason SeekDeep cannot assign to a haplotype.
I am attaching two .fastq files along with the other files I am using for SeekDeep. Have you ever encountered something similar?
Thanks again for your valuable help and advise, Nick.
Best,
Daniel
sampleNames.tab.txt overlapStatuses.txt lenCutOffs.tab.txt idFile.tab.txt