Excellent work!! Step 1 is a good idea and I like having the rev_comp function to interpret the bitwise flag. Seems like you have a good understanding of the assignment.
For 2C in your pseudocode, consider trying a different method for storing the values that you want to keep (not a dictionary) because you’ll want an immutable storage method for the read’s information. I’m also not sure you’ll need to keep the CIGAR string in whatever storage device you use because once you adjust the starting position you won’t need it anymore. Same thing with the bitwise flag, you’re only using that to determine strandedness so you can just interpret that before you store it.
Consider a separate function for deduplicating (separate from the ones you have) in which you’ll write out the new SAM. It might be useful to put in bioinfo.
Thanks Jack. Good idea on the needless storage of the CIGAR string and bitwise flags. I'll definitely make that adjustment. Will also spend some time pondering a new deduplicating function. Cheers!
Excellent work!! Step 1 is a good idea and I like having the rev_comp function to interpret the bitwise flag. Seems like you have a good understanding of the assignment. For 2C in your pseudocode, consider trying a different method for storing the values that you want to keep (not a dictionary) because you’ll want an immutable storage method for the read’s information. I’m also not sure you’ll need to keep the CIGAR string in whatever storage device you use because once you adjust the starting position you won’t need it anymore. Same thing with the bitwise flag, you’re only using that to determine strandedness so you can just interpret that before you store it. Consider a separate function for deduplicating (separate from the ones you have) in which you’ll write out the new SAM. It might be useful to put in bioinfo.
Nice job!