Closed zdk123 closed 1 year ago
This is not so much of a bug than it is an issue with how Python redirects the iterable inside ThreadPool.map
i think; there is no guarantee in the order in which the threads receive the items. I think ThreadPool.imap
might guarantee iteration order, but it's often slower. There's also the possibility that a first sequence takes longer than a second sequence, causing sequence number 2 to be returned first.
Since I'm going to change the ID
field of the GFF output to become the gene identifier as you suggested in #18 anyway, I don't think there will be a reason to allow changing the _num_seq
attribute manually. It shouldn't be used for anything else...
I agree with your interpretation, and the other fix will definitely solve this for us too. My only point I guess is that _num_seq doesn't get used until the results are written so it could be corrected manually.
Hit another bug in the output data.
Compare
output:
output:
This causes a mismatch in the input sequence order and the ID in the resulting GFF / stats file. Anyway of fixing this rather than modifying the state?