Closed semenko closed 2 years ago
The logic for detecting SAM format is here, that is, it has nothing to do with the file name but rather looks at the contents:
if (
(len(fields) >= 11)
and isdigit(fields[1])
and isdigit(fields[3])
and isdigit(fields[4])
and (fields[5] not in ['.', '+', '-'])
):
So it seems like your particular files are looking SAM-like enough to be called as such. Since the filetype is determined and set on the cython side, setting the filetype manually is not curently straightforward. If you are able to modify your input files (say, by changing the 6th position to be in .
, +
, or -
), that will be easiest. For example, the following is considered a BED:
BedTool("chr1 1 10 0 1 . 2 3 4 5 6 7", from_string=True).cut([0,1,2])
By the way, that error is because SAM format is 1-based, but Interval
objects are zero-based. So it was trying to subtract 1 from the 0 (which would be the position if it were a SAM file), causing an overflow from the resulting negative value.
Thanks!
Yeah, this is definitely an edge case — I'm refactoring some old bash scripts and this was the first situation I've seen where bedtools & pybedtools don't behave the same (i.e. bedtools map
handles this just fine).
Feel free to close this if there's no easy solution and/or don't think a warning to the user is appropriate.
I ran into an issue that took me a bit to debug --
I'm parsing a ~pseudo .bed file that looks like this:
chr1 1 10 0 1 2 3 4 5 6 7 8 9 ...
When there are 11 or more total columns, pybedtools considers this a
sam
and some commands error, e.g. if column 4 is zero.I ran into this using the .cut() helper, which threw a nebulous error:
This took me a while to figure out -- my file is named
.bed
-- and only rare lines where column 4 == 0 are considered invalid in my data. (Plus, bedtools itself handles these files just fine -- I'm running amap
on some of the colums much higher than position 7.)Any thoughts on this, or perhaps warning users (e.g. if the file extension is
.bed
but file_type is interpreted assam
)?Is there a way to override BedTool interpreting this as
.sam
?Thanks for developing & maintaining this amazing package!