Bed identified as SAM - Githubissues

Hi, have been strugling with an issue of the same type as #363 At some point I need to merge a bedfile and I end up with up a bedfile containing 11 column. I then need to separate some line of the BedTools object and then cat() them. For some reason the line are considered as "sam" instead of bed format. After digging in this issue I tried to check over this :

if (
        (len(fields) >= 11)
        and isdigit(fields[1])
        and isdigit(fields[3])
        and isdigit(fields[4])
        and (fields[5] not in ['.', '+', '-'])
    ):

But I haven't been able to make it work even by commenting the whole 'SAM' part and after recompiling it, the line that were identified as 'sam' are still considered as 'sam'

But at 1112 in bedtool.py:

self._file_type = six.advance_iterator(iter(self)).file_type

If I comment this line, then the attribute _file_type is not "sam" but stay None and this final solve my problem. But is there a good way to fix this ? I haven't been able to solve it in a proper way.

I'm sorry I actually haven't been able to reproduce this error in an easy way.

Here is the result I obtain:

chr1    710330  710331
*   0   0
chr1    3935209 3935210

When this is the result I want:

chr1    3935209 3935210 3935209,3935209,3935209 3935210,3935210,3935210 3935209,3935209,3935209 3935210,3935210,3935210 +   hg38    5   170,171,172

edit :

After digging a bit I finally managed to isolate the problem:

import pybedtools
import os, difflib, sys
import six

testdir = os.path.dirname(__file__)
tempdir = os.path.join(os.path.abspath(testdir), "tmp")
unwriteable = "unwriteable"

def test_malformed():
    """
    Malformed BED lines should raise MalformedBedLineError
    """

    a = pybedtools.BedTool(
        """
        chr1    3787399 3787400 3787399 3787400 3787399 3787400 +   hg38    5   169
        chr1    3935209 3935210 3935209,3935209,3935209 3935210,3935210,3935210 3935209,3935209,3935209 3935210,3935210,3935210 +   hg38    5   170,171,172

        """,
        from_string=True,
    )
    a_i = iter(a)

    print(six.advance_iterator(a_i).file_type)
    assert False

In this case i end up with a _file_type = "sam"


F                                                                                                                                                                                  [100%]
======================================================================================== FAILURES ========================================================================================
_____________________________________________________________________________________ test_malformed _____________________________________________________________________________________

    def test_malformed():
        """
        Malformed BED lines should raise MalformedBedLineError
        """

        a = pybedtools.BedTool(
            """
            chr1    3787399 3787400 3787399 3787400 3787399 3787400 +   hg38    5   169
            chr1    3935209 3935210 3935209,3935209,3935209 3935210,3935210,3935210 3935209,3935209,3935209 3935210,3935210,3935210 +   hg38    5   170,171,172

            """,
            from_string=True,
        )
        a_i = iter(a)

        print(six.advance_iterator(a_i).file_type)
>       assert False
E       assert False

test/test_malformed_bed.py:25: AssertionError
---------------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------------
sam
================================================================================ short test summary info =================================================================================
FAILED test/test_malformed_bed.py::test_malformed - assert False
1 failed in 0.19s

But if i invert the two line of my bed then i obtain a _file_type = "bed"

daler / pybedtools

Bed identified as SAM #370