Closed LisaHagenau closed 1 year ago
Hi Lisa,
I am sorry it didn’t work for you. Don’t rely on traditional duplicate metrics, with enzymatic digestion most fragments share the mapping coordinates. You have to rely on the barcodes to group reads into duplicate families (read bundles).
How many fmols did you take into the amplification? It seems you sequenced your library to 30x, for that we recommended 0.6 fmol (at present we are recommending 0.4 fmol). However, there are probably slight differences between different quantification methods. Those differences however shouldn’t change from 75-80% duplicate rates to 2.9%. If you have high concentrations of DNA it would be advisable to make dilutions before taking fmols.
Your yields are much lower, I am not sure why. Perhaps they are not the true yields and that’s what made things go wrong?
I hope we can find out what went wrong.
Best wishes, Fede
On 4 May 2022, at 15:34, LisaHagenau @.**@.>> wrote:
Hello,
I am having some trouble with our first NanoSeq results, but I am not sure whether this is a wetlab or bioinformatics problem (though I suspect the former). I processed the data as far necessary to run the efficiency_nanoseq.pl script, which returned an extremely low duplicate rate (0.028). Below are the commmands I ran to create these results:
python ~/src/nanoseq/bin/extract-tags.py -a data/raw/fastq/merged/S352_S1_R1.merged.fq -b data/raw/fastq/merged/S352_S1_R2.merged.fq -c data/processed/S352_extrR1.fastq -d data/processed/S352_extrR2.fastq -m 3 -s 4 -l 151
bwa mem -t 12 -C /mnt/genomes/hg/hg38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna data/processed/S352_extrR1.fastq data/processed/S352_extrR2.fastq > data/processed/S352_mapped.sam
bamsormadup inputformat=sam rcsupport=1 threads=12 < data/processed/S352_mapped.sam > data/processed/S352_mapped_od.bam bammarkduplicatesopt optminpixeldif=2500 threads=12 < data/processed/S352_mapped_od.bam > data/processed/S352_mapped_mdo.bam bamaddreadbundles -I data/processed/S352_mapped_mdo.bam -O data/processed/S352_filtered.bam
randomreadinbundle -I data/processed/S352_filtered.bam -O data/processed/S352_neat.bam
samtools index data/processed/S352_neat.bam samtools index data/processed/S352_filtered.bam
efficiency_nanoseq.pl -d data/processed/S352_neat.bam -x data/processed/S352_filtered.bam -r /mnt/genomes/hg/hg38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -o S352
NUM_UNIQUE_READS 302800159 NUM_SEQUENCED_READS 311577682 DUPLICATE_RATE 0.0281712186304794
TOTAL_RBS 11799643 TOTAL_READS_IN_RBS 24611888 OK_RBS(2+2) 16452 READS_PER_RB 1.042908 F-EFF 0.008890473 EFFICIENCY 0.00111409575730232 GC_BOTH 0.3772532 GC_SINGLE 0.3769166
What doesn't quite make sense to me is that in the duplex (filtered) bam file, 244 Mio reads are marked as duplicates (out of 311 Mio, using samtools flagstat), while the deduplicated (neat) bam file contains 302 Mio reads and no marked duplicates. Wouldn't that mean that most duplicate reads are in different read bundles?
If the bioinformatic analysis is correct, then I suspect that something went wrong with the library quantification which resulted in a massive underestimation of amplifiable fragments. We used a different kit for the qPCR than described in the methods (NEBNext Library Quant), but checked that the primers that come with the kit are the same as in the KAPA kit and added the NanoqPCR primers to a final concentration of approx 330 nM. We did observe ~10x lower library yields than described in the paper even with high DNA input from freshly prepared HMW DNA from HaCaT cells (see plot). For the sequenced sample (fibroblasts) we tried 3 dilutions (1:50, 1:500 and 1:5000), but only the 1:5000 sample showed a normal amplification curve which we used for calculating the fmol input.
[lib-yield] [user-images.githubusercontent.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_13033231_166703955-2D7ff7808a-2D708d-2D49da-2Db5d3-2D2846d8860e07.png&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=id_OBj0mifdlKNHPlzSmM3dJGs7n4bajPQJhmPoVFNV9JwzlQEp8rJJ2ks-GKWvo&s=ip2fRzqUFxe1cnDcmF551thamLwnP6GPu-t61L9lw0A&e=
Any help would be appreciated.
Thanks, Lisa
— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_37&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=id_OBj0mifdlKNHPlzSmM3dJGs7n4bajPQJhmPoVFNV9JwzlQEp8rJJ2ks-GKWvo&s=mQZt_QahGd44NVKg72Ux9WBeGg-6Qs9JMSd6dPcK85g&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3JRCH4CMA5PAEMT35TVIKDIJANCNFSM5VCFVJAQ&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=id_OBj0mifdlKNHPlzSmM3dJGs7n4bajPQJhmPoVFNV9JwzlQEp8rJJ2ks-GKWvo&s=raI8RFAbP4ZsDFjO5K4hX-JVjjVNil5Xc9xFyDdQWrg&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Hi Fede,
thank you for the quick answer. We were aiming for 0.3 fmol input, but ended up using approx 0.2 fmol and 15 PCR cycles so that we could sequence on a NextSeq MidOutput flow cell (which usually generates 130 million clusters). From how I understood the protocol, 15x equals 150 million read pairs, which to me means 300 million reads total. But even if we oversequenced, wouldn't that mean that we get less read bundles with higher coverage and so a higher duplicate rate?
If you have high concentrations of DNA it would be advisable to make dilutions before taking fmols.
We quantified the library at 3 different dilutions (prepared serially), but unfortunately, only the 1:5000 dilution amplified properly (within standard curve range), from which we calculated a concentration of 0.017 nM.
Your yields are much lower, I am not sure why. Perhaps they are not the true yields and that’s what made things go wrong?
Yes, I think so too. From the results, particularly the RB metrics, it seems likely to me that there is actually a lot more library than we quantified and so we used too much input and too many PCR cycles leading to too many read bundles with not enough reads.
I am thinking of running the qPCR again with both primer pairs on the final library (after the PCR). Theoretically, all fragments should be amplified equally by both primer pairs, correct? If the results are too disparate, then at least we know where the problem is.
Since the issue is probably wetlab-based, should I move the discussion to the protocol exchange site or is it okay to continue here?
Best, Lisa
Hi Lisa,
I don’t know much about the wet-lab side but I can put you in contact with Stef, the expert here.
Before sequencing so much next time, you could pick much fewer fmols and do shallow sequencing (MiSeq?). The ratio between sequenced reads / fmol is lineal. That would help you calibrate things on your side.
The number of PCR cycles shouldn’t matter that much, I think it’s just that you picked way more than 0.2 fmol. Looking at those duplicate rates, even much more than 2 fmols.
Best, Fede
On 5 May 2022, at 15:52, LisaHagenau @.**@.>> wrote:
Hi Fede,
thank you for the quick answer. We were aiming for 0.3 fmol input, but ended up using approx 0.2 fmol and 15 PCR cycles so that we could sequence on a NextSeq MidOutput flow cell (which usually generates 130 million clusters). From how I understood the protocol, 15x equals 150 million read pairs, which to me means 300 million reads total. But even if we oversequenced, wouldn't that mean that we get less read bundles with higher coverage and so a higher duplicate rate?
If you have high concentrations of DNA it would be advisable to make dilutions before taking fmols.
We quantified the library at 3 different dilutions (prepared serially), but unfortunately, only the 1:5000 dilution amplified properly (within standard curve range), from which we calculated a concentration of 0.017 nM.
Your yields are much lower, I am not sure why. Perhaps they are not the true yields and that’s what made things go wrong?
Yes, I think so too. From the results, particularly the RB metrics, it seems likely to me that there is actually a lot more library than we quantified and so we used too much input and too many PCR cycles leading to too many read bundles with not enough reads.
I am thinking of running the qPCR again with both primer pairs on the final library (after the PCR). Theoretically, all fragments should be amplified equally by both primer pairs, correct? If the results are too disparate, then at least we know where the problem is.
Since the issue is probably wetlab-based, should I move the discussion to the protocol exchange site or is it okay to continue here?
Best, Lisa
— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_37-23issuecomment-2D1118652018&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=KdVcKn4xZDzVSspmPfFmgFQaV_AFD0h7Nx2ANg0uvQ5gx-YlqZGxnsRQ7TlbOvX-&s=u8m5K0GQHUWdZZjhzdPmrQBarNu2aBFeW1jzEf__mhg&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3IJTQCZ6Z7VOQ2WGMLVIPOCDANCNFSM5VCFVJAQ&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=KdVcKn4xZDzVSspmPfFmgFQaV_AFD0h7Nx2ANg0uvQ5gx-YlqZGxnsRQ7TlbOvX-&s=PRWE6xx3hJ6_-toClM-_XHbGgtymxWOURkH9qc_GKs0&e=. You are receiving this because you commented.Message ID: @.***>
-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Hi Fede,
thank you. I will run some PCRs to try to get to the bottom of this issue. I would appreciate any input from the wetlab expert.
Best, Lisa
No problem, can you send me your email address and I will put you in contact with Stef? My email is @.**@.>
On 6 May 2022, at 11:34, LisaHagenau @.**@.>> wrote:
Hi Fede,
thank you. I will run some PCRs to try to get to the bottom of this issue. I would appreciate any input from the wetlab expert.
Best, Lisa
— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_37-23issuecomment-2D1119478812&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=rK3l1jYTMSJZtLlvYu6KR7d7fhxhtIsgfeRZmO2QQnDpv6Ab-dypjvckcBPt-8oz&s=RMc6OL-uUXPPTtmgJsyTetx2gBiFWVXySGuHVAKmyBU&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3KWBGEQRKU56G7VKZLVITYT5ANCNFSM5VCFVJAQ&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=rK3l1jYTMSJZtLlvYu6KR7d7fhxhtIsgfeRZmO2QQnDpv6Ab-dypjvckcBPt-8oz&s=ldsv55ZgkoaEZAqhJCMXWtjUWLEFYcBMjAqaeTEfbLs&e=. You are receiving this because you commented.Message ID: @.***>
-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Please, also check the RB tag is properly formatted: chr:position:barcode1:barcode2, just in case there was any bioinformatic problem upstream
On 4 May 2022, at 15:34, LisaHagenau @.**@.>> wrote:
Hello,
I am having some trouble with our first NanoSeq results, but I am not sure whether this is a wetlab or bioinformatics problem (though I suspect the former). I processed the data as far necessary to run the efficiency_nanoseq.pl script, which returned an extremely low duplicate rate (0.028). Below are the commmands I ran to create these results:
python ~/src/nanoseq/bin/extract-tags.py -a data/raw/fastq/merged/S352_S1_R1.merged.fq -b data/raw/fastq/merged/S352_S1_R2.merged.fq -c data/processed/S352_extrR1.fastq -d data/processed/S352_extrR2.fastq -m 3 -s 4 -l 151
bwa mem -t 12 -C /mnt/genomes/hg/hg38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna data/processed/S352_extrR1.fastq data/processed/S352_extrR2.fastq > data/processed/S352_mapped.sam
bamsormadup inputformat=sam rcsupport=1 threads=12 < data/processed/S352_mapped.sam > data/processed/S352_mapped_od.bam bammarkduplicatesopt optminpixeldif=2500 threads=12 < data/processed/S352_mapped_od.bam > data/processed/S352_mapped_mdo.bam bamaddreadbundles -I data/processed/S352_mapped_mdo.bam -O data/processed/S352_filtered.bam
randomreadinbundle -I data/processed/S352_filtered.bam -O data/processed/S352_neat.bam
samtools index data/processed/S352_neat.bam samtools index data/processed/S352_filtered.bam
efficiency_nanoseq.pl -d data/processed/S352_neat.bam -x data/processed/S352_filtered.bam -r /mnt/genomes/hg/hg38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -o S352
NUM_UNIQUE_READS 302800159 NUM_SEQUENCED_READS 311577682 DUPLICATE_RATE 0.0281712186304794
TOTAL_RBS 11799643 TOTAL_READS_IN_RBS 24611888 OK_RBS(2+2) 16452 READS_PER_RB 1.042908 F-EFF 0.008890473 EFFICIENCY 0.00111409575730232 GC_BOTH 0.3772532 GC_SINGLE 0.3769166
What doesn't quite make sense to me is that in the duplex (filtered) bam file, 244 Mio reads are marked as duplicates (out of 311 Mio, using samtools flagstat), while the deduplicated (neat) bam file contains 302 Mio reads and no marked duplicates. Wouldn't that mean that most duplicate reads are in different read bundles?
If the bioinformatic analysis is correct, then I suspect that something went wrong with the library quantification which resulted in a massive underestimation of amplifiable fragments. We used a different kit for the qPCR than described in the methods (NEBNext Library Quant), but checked that the primers that come with the kit are the same as in the KAPA kit and added the NanoqPCR primers to a final concentration of approx 330 nM. We did observe ~10x lower library yields than described in the paper even with high DNA input from freshly prepared HMW DNA from HaCaT cells (see plot). For the sequenced sample (fibroblasts) we tried 3 dilutions (1:50, 1:500 and 1:5000), but only the 1:5000 sample showed a normal amplification curve which we used for calculating the fmol input.
[lib-yield] [user-images.githubusercontent.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_13033231_166703955-2D7ff7808a-2D708d-2D49da-2Db5d3-2D2846d8860e07.png&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=id_OBj0mifdlKNHPlzSmM3dJGs7n4bajPQJhmPoVFNV9JwzlQEp8rJJ2ks-GKWvo&s=ip2fRzqUFxe1cnDcmF551thamLwnP6GPu-t61L9lw0A&e=
Any help would be appreciated.
Thanks, Lisa
— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_37&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=id_OBj0mifdlKNHPlzSmM3dJGs7n4bajPQJhmPoVFNV9JwzlQEp8rJJ2ks-GKWvo&s=mQZt_QahGd44NVKg72Ux9WBeGg-6Qs9JMSd6dPcK85g&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3JRCH4CMA5PAEMT35TVIKDIJANCNFSM5VCFVJAQ&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=id_OBj0mifdlKNHPlzSmM3dJGs7n4bajPQJhmPoVFNV9JwzlQEp8rJJ2ks-GKWvo&s=raI8RFAbP4ZsDFjO5K4hX-JVjjVNil5Xc9xFyDdQWrg&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Please, also check the RB tag is properly formatted: chr:position:barcode1:barcode2, just in case there was any bioinformatic problem upstream
Thank you, I just checked and the RB tag is formatted like this in the bam file:
RB:Z:chr1,11038,11177,TTT,CTT
So this could be an additional problem? I did not close the issue yet as we are still working on the quantification issue. We are getting much higher library yields with our new qPCR setup using a synthesized standard specific to the Nanoseq adapters. I hope we can confirm this by sequencing within the next two weeks.
Thanks, Lisa
Hello,
I think we have our quantification method sorted and we want to start a new sequencing run soon. But we noticed some high molecular weight smear when we ran a Bioanalyzer assay after the second PCR. I think this indicates an overamplification. The library with the low duplicate rate also shows massive overamplification (which makes sense). Is this a common observation and are these libraries okay to sequence in your opinion? I'd really appreciate your feedback.
Thanks, Lisa
Hi Lisa,
Not sure about that. Sometimes higher insert sizes indicate ligation between fragments, later resulting in lower proportions of properly paired reads. But I think I haven’t seen such large fragments, and I doubt they are PCR products. It could also be there are free adapters and you are seeing PCR recombination.
In a case like this we would just sequence and see. If your budget is tight am not sure what to advice
About your quantification method, how did you validate it? by sequencing and estimating duplicate rates/complexity of the library? If you obtained sequencing data for this it could be valuable to understand whether high molecular DNA is a problem or not.
Best, Fede
On 18 Nov 2022, at 14:13, LisaHagenau @.**@.>> wrote:
Hello,
I think we have our quantification method sorted and we want to start a new sequencing run soon. But we noticed some high molecular weight smear when we ran a Bioanalyzer assay after the second PCR. I think this indicates an overamplification. The library with the low duplicate rate also shows massive overamplification (which makes sense). Is this a common observation and are these libraries okay to sequence in your opinion? I'd really appreciate your feedback.
Thanks, Lisa
[221118_bioanalyzer_B1-G1] [user-images.githubusercontent.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_13033231_202723929-2Df63da975-2Db6d9-2D4714-2D9806-2D0f5342bd0105.png&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=gE6iqz9uiXfS5OB4SqYYmLOAesm5t9wOoIG-cHM1gp4cylJvbeROq73IcKKgNSq8&s=1AzEtZ6TxFjOJaNNtsTKKXvfMzSorHS2l5f-DFPi3YY&e=
— Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cancerit_NanoSeq_issues_37-23issuecomment-2D1320047103&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=gE6iqz9uiXfS5OB4SqYYmLOAesm5t9wOoIG-cHM1gp4cylJvbeROq73IcKKgNSq8&s=kidMuV_7mS52QJwXP4EuhtFhumbbMrDCCiiDDj-3-hw&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADNUT3IPSAOP33554DBGN6LWI6FI5ANCNFSM5VCFVJAQ&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=v9-R7fUmjpv-9Zaqyk1nlnlOC3qPkTEJz5tyYxg2uec&m=gE6iqz9uiXfS5OB4SqYYmLOAesm5t9wOoIG-cHM1gp4cylJvbeROq73IcKKgNSq8&s=mqUA6v5w4Z-low_qy8bsROnvDPz0thSDqNFnOKLyxU0&e=. You are receiving this because you commented.Message ID: @.***>
-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
So it turns out that it was overamplification. We skipped a dilution step during library preparation and again used way too much input DNA for the 2nd PCR, resulting in a low duplicate rate. We ordered a new flowcell and can hopefully run it next week. Third time's the charm...
Good news, the quantification and dilution worked and we finally have some promising results (though not quite optimal yet). We applied four different correction factors before PCR amplification:
0.75x | 1x | 1.5x | 2x | |
---|---|---|---|---|
NUM_UNIQUE_READS | 17780921 | 19063518 | 26977953 | 28716908 |
NUM_SEQUENCED_READS | 77286712 | 61072936 | 69592529 | 62001104 |
DUPLICATE_RATE | 0.7699356 | 0.68785653 | 0.61234412 | 0.53683231 |
TOTAL_RBS | 747958 | 794587 | 1119105 | 1186732 |
TOTAL_READS_IN_RBS | 6239178 | 4959980 | 5672574 | 5057478 |
OK_RBS(2+2) | 150345 | 116937 | 115603 | 74749 |
READS_PER_RB | 4.170808 | 3.121106 | 2.534424 | 2.130843 |
F-EFF | 0.3855591 | 0.3263629 | 0.2582081 | 0.2441442 |
EFFICIENCY | 0.04016154 | 0.03929351 | 0.03396547 | 0.02463316 |
GC_BOTH | 0.4029726 | 0.3995528 | 0.3952528 | 0.3991534 |
GC_SINGLE | 0.4096244 | 0.4024825 | 0.4000627 | 0.4054076 |
Based on these results, I would apply a correction factor of 0.6x or so for the next library prep. The strand drop-out fraction is a bit high, but the DNA we used has been in storage for a while, so this was kind of expected.
For future reference, the quantification method that we use is based on a synthetic standard that contains the NanoSeq adapter sequences. We essentially chose a sequence from the ERCC (ERCC-00171), removed the poly-A tail and added the NanoSeq adapter sequences to the ends (link). We ordered it as a gBlock from IDT. We then used it as standard for the qPCR quantification of the NanoSeq libraries. The library yields were close to the ones reported in the paper.
Thank you for your help!
Lisa
Hello,
I am having some trouble with our first NanoSeq results, but I am not sure whether this is a wetlab or bioinformatics problem (though I suspect the former). I processed the data as far necessary to run the efficiency_nanoseq.pl script, which returned an extremely low duplicate rate (0.028). Below are the commmands I ran to create these results:
What doesn't quite make sense to me is that in the duplex (filtered) bam file, 244 Mio reads are marked as duplicates (out of 311 Mio, using samtools flagstat), while the deduplicated (neat) bam file contains 302 Mio reads and no marked duplicates. Wouldn't that mean that most duplicate reads are in different read bundles?
If the bioinformatic analysis is correct, then I suspect that something went wrong with the library quantification which resulted in a massive underestimation of amplifiable fragments. We used a different kit for the qPCR than described in the methods (NEBNext Library Quant), but checked that the primers that come with the kit are the same as in the KAPA kit and added the NanoqPCR primers to a final concentration of approx 330 nM. We did observe ~10x lower library yields than described in the paper even with high DNA input from freshly prepared HMW DNA from HaCaT cells (see plot). For the sequenced sample (fibroblasts) we tried 3 dilutions (1:50, 1:500 and 1:5000), but only the 1:5000 sample showed a normal amplification curve which we used for calculating the fmol input.
Any help would be appreciated.
Thanks, Lisa