Not sure this is technically a bug, but something that could cause issues. I noticed an inconsistency with the ordering of values in the exon_number attribute in introns (generated from create_introns). For most introns, the exon_number field will contain a list like ["1", "2"]. The idea being that this intron occurs after exon 1 and before exon 2, and is therefore "intron 1". This holds for all introns until the intron spanning exon 9 -> exon 10. The sorting is done using strings (at least based on how its formatted in an NCBI gtf). "10" will sort before "9", and thus the result will be ["10", "9"].
Might be tricky to get this to work in all cases, but seems like you could check str.isdigit on each item in v. If all are true, cast to int for the sorting, then cast back to string. Perhaps enable this via a flag.
Not sure this is technically a bug, but something that could cause issues. I noticed an inconsistency with the ordering of values in the
exon_number
attribute in introns (generated fromcreate_introns
). For most introns, theexon_number
field will contain a list like["1", "2"]
. The idea being that this intron occurs after exon 1 and before exon 2, and is therefore "intron 1". This holds for all introns until the intron spanning exon 9 -> exon 10. The sorting is done using strings (at least based on how its formatted in an NCBI gtf). "10" will sort before "9", and thus the result will be["10", "9"]
.Example:
Code:
I think the issue is here: https://github.com/daler/gffutils/blob/4b5b28e610a435af359ab1c31271deea1bae4c47/gffutils/helpers.py#L333
Might be tricky to get this to work in all cases, but seems like you could check
str.isdigit
on each item inv
. If all are true, cast toint
for the sorting, then cast back to string. Perhaps enable this via a flag.