Open kae-gi opened 2 years ago
Awesome, thank you for the feedback! Definitely will be outputting/writing to a deduped SAM file. I'll probably write a function to read in known UMI file (via argparse) and initialize a list for reference.
Argparse plan will be -s SAM FILE -u UMI file
Nice peudo code! I can follow the logic, and the order of operations section makes it easy to follow.
The algorithm almost does everything it is supposed to do from my view. It is unclear what you are doing with the deduplicated data: after your funnel runs to completion, all that is left are PCR duplicates and these get omitted. Perhaps you mean to write out the deduplicated data somewhere?
Your proposed functions also seem reasonable to me. Perhaps a function for reading in the known UMIs would be helpful.
I would say that sorting the data by chromosome makes sense, and was actually something I was planning on doing to reduce the number of stored reads I was looking at at a time. As far as additional thoughts, adding some argparse on the final code may be worth it.