epi2me-labs / pychopper

cDNA read preprocessing
Other
61 stars 9 forks source link

Large variation in a trimmed output #53

Closed NikoLichi closed 9 months ago

NikoLichi commented 9 months ago

Hi there,

I am trying this nice tool at a colleague's recommendation. And would like to understand pychopper better. I test differences between basecalling with Dorado using different GPU architecture and this leads to changes in the sequence. This output is used as input for pychopper.

However, I noticed that after running pychopper there is a significant change in one sequence that has very good quality (>10) How can this be explained? Is this output correct?

Thanks and all the best, Nicolas

My input: ATGTTTATGTGCAGCCCTTCTTCGACTGGATATCAAGTTTGATATTAACCGAACACGCCTACCGTGACAAGAAAGTTGTCGGTGTCTTTGTGTTTCTGTTGGTGCTGATATTGCTTTCGAGTTAGGGTTACCATTCGGGTTTGGGGCCTTTCCCCTGGCTGGCAGCGCGGAGGCCGCACGATGCCTGGAGTTACTGTAAAAGACGTGAACCAGCAGGAGTTCGTCAGAGCTCTGGCAGCCTTCCTCAAAAAGTCCGGGAAGCTGAAAGTCCCCCAATGGGTGGATACCGTCAAGCTGGCCAAGCACCAAAGAGCTTGCTCCCTACGATGAGAACTGGTTCTACACGCGAGCTGCTTCCCACAGCGCGGCACCTGTACCTCCGGGGTGGCGCTGGGGTTGGCTCCATGACCAAGATCTATGGGGGACGTCAGAGAAACGGCGTCATGCCCAGCCACTTCAGCCGAGGCTCCAAGAGTGTGGCCCGCCGGGTCCTCCAAGCCCTGGAGGGGCTGAAAATGGTAAAGGACCAAGATGGCGGCCGCAAACTGACACCTCAGGGACAAAGAGATCTGGACGGATCTCACCCCCGGACAGGTGGCAGCTGCCAACAAGAAGCATTCGAACAAACCATGCTGGGTTAATAAATTGCCTCATTCGTAAAAAAAAAAAAAAAAAAAAAAAAACTTGCGGGCGGCGGACTCTCCTCTGAAGATAGAGCGACAGGCAAGTCACAAAGACACCGACAACTTTCTTGTCAAA Pychopper output: TGAGTTAGGGTTATTCATTCGGATTTGGCATTTCCCCTGGCTGGCAGCGCGGAGGCCGCACGATGCCTGGAGTTACTGTAAAAGACGTGAACCAGCAGGAGTTCGTCAGAGCTCTGGCAGCCTTCCTCAAAAAGTCCGGGAAGCTGAAAGTCCCCGAATAGGTACATGCATGACTGGCCAAGCACCAAAGAGCTTGCTCCCTACGATGAGAACTGGTTCTACACGCGAGCTGCTTTTTCCACAGCGCGGCACCTGTACCTCCGGGGTGGCGCTGGGGTTGGCTCCATGACCAAGATCTATGGGGGACGTCAGAGAAACGGCGTCATGCCCAGCCACTTCAGCCGAGGCTCCAAGAGTGTGGCCCGCCGGGTCCTCCAAGCCCTGGAGGGGCTGAAAATGGTAAAGGACCAAGATGGCGGCCGCAAACTGACACCTCAACGAGATCTGGACAGGATCTCGCCCGGACAGGTGGCAGCTGCCGACGAAGCATTCGAACAAACCATGCTGGGTTAATAAATTGCCTCATTCGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

NikoLichi commented 9 months ago

Hi There, I did some trials with other colleagues and noticed that Pychopper behaviors is consistent and reproducible. As mentioned before, the issue comes from the DORADO basecalling. There are notorious differences between the different GPU architecture and this has an impact for some downstream analyses. I am closing this issue. Best, Niko