Tripfantasy / Deduper-tripfantasy

0 stars 0 forks source link

Kaetlyn's pseudocode review #1

Open kae-gi opened 2 years ago

kae-gi commented 2 years ago

Nice peudo code! I can follow the logic, and the order of operations section makes it easy to follow.

The algorithm almost does everything it is supposed to do from my view. It is unclear what you are doing with the deduplicated data: after your funnel runs to completion, all that is left are PCR duplicates and these get omitted. Perhaps you mean to write out the deduplicated data somewhere?

Your proposed functions also seem reasonable to me. Perhaps a function for reading in the known UMIs would be helpful.

I would say that sorting the data by chromosome makes sense, and was actually something I was planning on doing to reduce the number of stored reads I was looking at at a time. As far as additional thoughts, adding some argparse on the final code may be worth it.

Tripfantasy commented 2 years ago

Awesome, thank you for the feedback! Definitely will be outputting/writing to a deduped SAM file. I'll probably write a function to read in known UMI file (via argparse) and initialize a list for reference.

Argparse plan will be -s SAM FILE -u UMI file