Closed rob-p closed 2 years ago
Based on the comments in test_bam2bam
, the purpose of this test is to count the number of alignment records that could not be converted to any transcript locations. Below, I did the calculations on the original and converted bam files and it turns out there are 35 alignment records (not 33) that were not converted to transcript locations which confuses me.
$ mudskipper bulk --gtf NC_002333.2.gtf --alignment NC_002333.2.sam --out NC_002333.2_toTranscriptome.bam
$ samtools view NC_002333.2.sam | cut -f1 | sort | uniq | wc -l
85
$ samtools view NC_002333.2_toTranscriptome.bam | cut -f1 | sort | uniq | wc -l
50
So there are 85-50=35 records that mudskipper failed to convert.
Right now, in the code the return value of the read_and_process
function which is basically the return value of the bam2bam
function is something entirely different. It returns the number of alignment records that do not have any reads in them which is zero (I'm not sure how it can get to be non-zero).
I can modify the code so that we properly count the number of missing records but I think this number could change in the future versions of mudskipper (and possibly go lower for example if we improve the conversion of the complicated alignments). So, I'm not sure if this would be a proper unit test. What do you think?
Hi @HosseinAsghari,
Thanks! I think this still makes sense as a unit test. Presumably, barring bugs, what projections we see should be a function of the parameters we provide to mudskipper (e.g. fraction softclipping allowed, etc.). Therefore, as long as we control how the unit-test is parameterized, it's probably valid to have to make sure we don't produce any unexpected regressions.
Thanks! Rob
The test
test_bam2bam
on line 32 oftests/bam_test.rs
currently fails. It was failing before because the return type ofread_and_process
was not right. I fixed that, but this test is still failing. It should fail to find / write 33 records, but has 0. Actually since bam2bam didn't return an integer, it is unclear how this value is to be calculated, so I took a best guess (probably wrong). Can you advise @sjbaker47 and @HosseinAsghari (cc @mohsenzakeri, @haghshenas).Thanks! Rob