SorenKarst / longread_umi

GNU General Public License v3.0
76 stars 29 forks source link

umi12 filtered during umi binning gawk command test_r941 data #52

Open cstill3928 opened 2 years ago

cstill3928 commented 2 years ago

Hi,

Thanks for this package, I had a quick question about your umi_binning.sh script, specifically the gawk command on lines 383-580. For this troubleshooting, I'm using the test_r941 data given with the package. I noticed that while there is a umi bin 12 in the sam files, this doesn't get carried through to the umi_bin_map.txt file that is the output of this gawk command as seen below:

Screen Shot 2022-05-17 at 9 25 23 PM

When I checked the umi_binning_stats.txt file the only difference I noticed about the umi bin 12 compared to other bins was that the read orientation needed normalization as marked by "rof_subset" (I'm guessing the read_max_plus and read_max_neg differences are also related to this). With that being said, this shouldn't have stopped umi bin 12 from being carried through (I would expect "rof_fail" to present a problem, not "rof_subset"). On line 573 of the umi_binning.sh, should rof_subset also be marked as fine?

Screen Shot 2022-05-17 at 9 32 59 PM

Finally, in your umi_binning_stats.txt file, which is created on lines 556-565 of the umi_binning.sh script, I think there is an extra column name given, specifically the "read_orientation_ratio" column name. You can see when I read in the txt file, I'm left with an empty column on the right. Furthermore if you shift entries over to the right starting at the "read_orientation_ratio" column, the entries make sense with their column name.

Screen Shot 2022-05-17 at 9 36 34 PM

I'm not sure if I'm understanding/doing stuff wrong/ pointing out non-important stuff but any clarification would be helpful.

Thanks, Chris