epi2me-labs / pychopper

cDNA read preprocessing
Other
54 stars 9 forks source link

Pychopper New PCR-cDNA Kit PCB114.24 #55

Open MustafaElshani opened 6 months ago

MustafaElshani commented 6 months ago

Is pychopper compatible with with the new SQK-PCB114.24 kit. Will the '-k PCS111' work ?

nrhorner commented 6 months ago

Hi @MustafaElshani

Yes, you can use the -k PCS111 option as the primer sequences are the same.

damioresegun commented 3 months ago

Hi, I just want to confirm that PCS111 is appropriate?

The PI for the project I'm working on provided these: The v14 kit we used: Top strand: 5' - ATCGCCTACCGTGA - barcode - TTGCCTGTCGCTCTATCTTC - 3' Bottom strand: 5' - ATCGCCTACCGTGA - barcode - TCTGTTGGTGCTGATATTGC - 3'

Previous PCS111 kit: Top strand: 5' - ATCGCCTACCGTGA - barcode - ACTTGCCTGTCGCTCTATCTTC - 3' Bottom strand: 5' - ATCGCCTACCGTGA - barcode - TCTGTTGGTGCTGATATTGC - 3'

I've tried running with PCS111 but I am seeing some loss of full-length transcripts and what looks like 3' bias of transcript fragments. Would the 2nt difference between the top strands make a large difference?

Additionally, would it be beneficial to use the primer config file rather than PCS111?

MustafaElshani commented 3 months ago

Hi Dami Looking at the ONT docs

Latest version of chemistry technical document has the following for PCB114.24 image

While the August 10th, 2022 chemistry technical document version has image

I think they are the same, as for 3'bias I have noticed it much but will have to go back and look . Currently have some new data and will keep an eye out.

damioresegun commented 3 months ago

Hey Mustafa (we should catch up),

Thanks for sharing those technical doc screenshots. You're right that the sequences in the technical docs are the same. The sequences that I shared were from the PI who did the work. There is a difference between these and those in the technical docs. In the top strand of PCS111, there are two additional nts (AT) before TTGCCT. These two nts are also present in the PCS111_primers.fa in pychopper.

@nrhorner I'm thinking of perhaps removing these two nts to the PCS111_primers.fa file VNP sequence. Do you think that would work at all i.e. pychopper won't break?

MustafaElshani commented 3 months ago

@damioresegun Would love a catch up

I run some samples today without the AC on PCS111_primers.fa file, nothing breaks with >93% of the primers found. I dont think there is much difference when these nt were present.

My other question is, these are latest dorado basecalled files with --barcode-both-ends which I assumed would all have primers present. My question is why does pychopper not detected in 100% of the reads?

nrhorner commented 3 months ago

Hi @MustafaElshani I'm not sure of the relevance of using --barcode-both-ends in Dorado on the ability of Pychopper to identify the primers in the reads. I don't think we'd ever expect to see 100% of reads classified as full length.

MustafaElshani commented 3 months ago

Hi @nrhorner I assumed that '--barcode-both-ends' in Dorado filters out any reads which it cannot classify due to primers not present in both ends, than postulated that pychopper would be able detect in all and re-orientate. I admit that I could be misunderstanding it.

damioresegun commented 2 months ago

Update: I made a new PCS-114_primers.fas and added the appropriate code to accept it into my local version of the package. Ran it and from what I can see, it doesn't make that much difference. If anything, it detects slightly less primers (~2% less than PCS111). To be clear, all I did was delete the 'AC' nts from the start of VNP sequence in the PCS111_primers.fas. @nrhorner I believe my approach was the most viable way given that I can't generate a new hmm profile to use instead of the current PCS110 hmm profile that's used for PCS111. Is there any scope of you guys adding in support for 114? (Just wondering)

The data I have currently won't really let me assess the true effect of this change in terms of accuracy so I may revisit it at a later date to check but I'll just carry on using PCS111 for now.

CyberGypsy6324 commented 2 months ago

Hi, I have encountered a similar issue. The kit I'm using, SQK-PCS114, is not in the options provided by the -k parameter. In this case, do you think I should use PCS111 instead?

nrhorner commented 1 month ago

@MustafaElshani sorry for the late reply

Hi @nrhorner I assumed that '--barcode-both-ends' in Dorado filters out any reads which it cannot classify due to primers not present in both ends, than postulated that pychopper would be able detect in all and re-orientate. I admit that I could be misunderstanding it.

Sorry for the late reply. I'm not sure why Pychopper does not return 110% of the reads. Did you use the hmm or edlib option? The former is more sensitive.

nrhorner commented 1 month ago

Hi, I have encountered a similar issue. The kit I'm using, SQK-PCS114, is not in the options provided by the -k parameter. In this case, do you think I should use PCS111 instead?

@CyberGypsy6324 Yes please use the PCS111 option