Closed ArtPoon closed 5 months ago
A trivial fix would be to cast the items from line
as float
before calling int
, but we should probably first determine why the program is returning 1.0
Think I found the issue.
This is a segment of the output from the .frags
file generated by the gencov script.
# SeBC Sim BC KA Aligned Offsets Num Num Tot MisM
# NaPvalue Pvalue Begin End Len Poly Dif Difs Pen.
AI Cc_3>ERS990491;Cc_1_828>ERS990617 0.0465 0.84130 1 222 222 52 2 51 4
AI Cc_2>ERS990555;Cc_1_1150>ERS990330 0.0465 > 1.0 29 697 669 141 2 12 15
The error comes from the way we split the row of text. Typically, the 4th column would be a float, in the case of the second info line, it is instead > 1.0
. When we hold this value in the list we separate >
and 1.0
into two different items in the list when it should instead be one. This pushes 1.0
to the next index on the list, which should've been an int, causing the error.
In this case, should I just make it so we ignore the >
?
We should be able to split on tabs instead of general whitespace (which is what happens when we call split()
, which should keep the >
and 1.0
together. Then we need to add a check for non-numerical characters and strip them out, i.e., drop the >
.
I was wrong about the file being tab separated. Splitting by tabs didn't separate the string. I did the following and it allowed the geneconv analysis to run:
line[2:] = [item for item in line[2:] if all(char.isalnum() or char == '.' for char in item)]
where the first 2 items in line
should be characters
@WilliamZekaiWang to push fix to dev
branch for fast review before PR
A user reported problems running geneconv on their data, with the following exception thrown: