Closed dprat closed 1 year ago
So when the first line's fourth field is 3935209,3935209,3935209
, according to the SAM heuristic:
if (
(len(fields) >= 11)
and isdigit(fields[1])
and isdigit(fields[3])
and isdigit(fields[4])
and (fields[5] not in ['.', '+', '-'])
):
isdigit(fields[3])
is false. So it's detected as a BED.
But when the first line's fourth field is 3787399
,isdigit(fields[3])
is True so it's detected as SAM, and all of the other criteria for SAM also match.
So this is a corner case where you happen to have an 11-field BED file where the name field is an integer. If you have a better way of detecting these kinds of corner cases directly that doesn't have a performance impact, that would be great to have as a PR. As a workaround, you could try adding a field to your bed so you have more than 11 fields, which should make it fail the SAM criteria.
Hi, have been strugling with an issue of the same type as #363 At some point I need to merge a bedfile and I end up with up a bedfile containing 11 column. I then need to separate some line of the BedTools object and then
cat()
them. For some reason the line are considered as "sam" instead of bed format. After digging in this issue I tried to check over this :But I haven't been able to make it work even by commenting the whole 'SAM' part and after recompiling it, the line that were identified as 'sam' are still considered as 'sam'
But at 1112 in bedtool.py:
If I comment this line, then the attribute _file_type is not "sam" but stay None and this final solve my problem. But is there a good way to fix this ? I haven't been able to solve it in a proper way.
I'm sorry I actually haven't been able to reproduce this error in an easy way.
Here is the result I obtain:
When this is the result I want:
edit :
After digging a bit I finally managed to isolate the problem:
In this case i end up with a _file_type = "sam"
But if i invert the two line of my bed then i obtain a _file_type = "bed"