NAL-i5K / GFF3toolkit

Python programs for processing GFF3 files
Other
94 stars 27 forks source link

error with gff3_sort: missing SeqID #85

Closed lassancejm closed 5 years ago

lassancejm commented 5 years ago

Hi,

I am experiencing the following error with gff3_sort:

`ERROR [Missing SeqID] Missing SeqID.

As the line is not missing anything as far as I can tell, it seems that the error is caused by the naming of the seqID ("chrX").

mpoelchau commented 5 years ago

Hi @lassancejm - does your fasta file contain a sequence with the ID chrX in the defline?

lassancejm commented 5 years ago

yes, chrX is present in the fasta file.

A simple fix was to change "X" into an arbitrary numerical value and then the command runs.

mpoelchau commented 5 years ago

Really! That would be a bug. Thanks for the report; we'll look into this.

mpoelchau commented 5 years ago

@tony006469 can you add a new argument 'seqid-sort' to gff3_sort that will sort the gff3 by the number of the reference sequence (the current behavior)? If 'reference-sort' is not passed, then the program should use the pre-existing scaffold order for sorting - so, the order that the seqids have in the original gff.

tony006469 commented 5 years ago

@mpoelchau
https://github.com/NAL-i5K/GFF3toolkit/tree/new_argument_for_seqid_no_number It's ready for testing.

tony006469 commented 5 years ago
screen
tony006469 commented 5 years ago

Command: gff3_sort -g /home/tony/Desktop/sort_testfiles/test2.gff -og example-sorted.gff3 -r

1.png

2.png

3.png

mpoelchau commented 5 years ago

Fixed via #87. Thanks for reporting @lassancejm. Closing this - feel free to reopen if you notice any problems.