Open dtdoering opened 1 year ago
@dtdoering I'm unable to reproduce in a test -- see the new test at https://github.com/daler/gffutils/pull/221/commits/a4b443b1eb8f9c17d2b6664f9c63f53b8b1cb0a9 and passing tests here.
Are you able to reproduce this with the latest in the v0.12rc
branch? Or, if you don't get around to it for a little while, with v0.12 once it's released?
Hey @daler, getting back around to this now!
I pasted your test_issue_213
function into a new script, and am getting the same results as you on the latest v0.12
tag (i.e. no issue).
However, I must have done a poor job with my initial description of the issue because I am now only able to reproduce the issue again by supplying a dialect
argument in the create_db
calls in your test function. This is something I had been/am doing in my real-world code when i first encountered the issue, so I'm not sure how I was reproducing the issue without that argument. My apologies! 😅
At any rate, by supplying dialect={'fmt': 'gff3'}
to the create_db
calls in your test function, I am now indeed getting AssertionError
s, whereas I don't get them if I omit the dialect
argument and let it figure out the dialect.
For the sake of completeness, here is the script (gffutils_debug.py
) that I'm running your test with:
#!/usr/bin/env python
import sys
import os
import tempfile
from textwrap import dedent
gffutils_git_path = os.path.join(os.environ.get('HOME'), 'sft', 'gffutils')
sys.path.insert(1, gffutils_git_path)
import gffutils
print(f"gffutils version: {gffutils.__version__}")
def test_issue_213():
# GFF header directives seem to be not parsed when building a db from
# a file, even though it seems to work fine from a string.
data = dedent(
"""
##gff-version 3
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
"""
)
# Ensure directives are parsed from DataIterator
it = gffutils.iterators.DataIterator(data, from_string=True)
assert it.directives == ["gff-version 3"]
# Ensure they're parsed into the db from a string
db = gffutils.create_db(data, dbfn=":memory:", from_string=True, verbose=False, dialect={'fmt': 'gff3'})
assert db.directives == ["gff-version 3"], db.directives
# Ensure they're parsed into the db from a file
tmp = tempfile.NamedTemporaryFile(delete=False).name
with open(tmp, "w") as fout:
fout.write(data + "\n")
db = gffutils.create_db(tmp, ":memory:", dialect={'fmt': 'gff3'})
assert db.directives == ["gff-version 3"], db.directives
assert len(db.directives) == 1
# Ensure they're parsed into the db from a file, and going to a file (to
# exactly replicate example in #213)
db = gffutils.create_db(tmp, dbfn='issue_213.db', force=True, dialect={'fmt': 'gff3'})
assert db.directives == ["gff-version 3"], db.directives
assert len(db.directives) == 1
test_issue_213()
Then I run it with:
~/gffutils_debug.py
I am trying to read in a GFF that doesn't adhere to spec so I can use
gffutils
to fix it and then write out a corrected file. I've gotten everything working, except that the GFF header directive(s) don't appear in the output file. It appears they are not being parsed upon creation of the database.Interestingly, this only happens when a FeatureDB is created from a file, not from e.g. a
dedent()
ed string as in the existingparser_test.py
.This behavior is exhibited in v0.11.1 and v0.11.0, but not v0.10.1 (thus, a workaround is to downgrade to v0.10.1).
To reproduce:
Use
gffutils
> v0.10.1.Create a test file
test.gff
with the following:Example code:
Expected output:
Observed output:
System/environment info:
OS: GNU/Linux
Python: 3.8.5 (still happens with 3.8.13)
Conda environment: