Closed gavinmdouglas closed 2 years ago
See this related issue in VirSorter: https://githubhot.com/repo/jiarong/VirSorter2/issues/81
Hey there, Gavin :)
Thanks so much for the kind words and for letting me know about this!
Sorry you had to track things down, I'm very grateful you did though!
I will for sure go through and add in catches for intermediate steps like this, that is terrible this can happen right now
For this particular issue, since it's a hard-coded HMMER limitation like you mentioned (and noted by them here: https://github.com/EddyRivasLab/hmmer/issues/244), I'm thinking of putting in a filtering step to remove any proteins this big prior to running the hmm searching step. That way that genome as a whole still goes through. Does that sound good to you?
Thanks again for figuring out what was going on and writing in about this!
-Mike
Hi Mike,
That sounds like a very reasonable change to remove those proteins. I wasn't sure if GToTree was calling proteins or if that was something hmmsearch was doing, but definitely that sounds great to me.
Cheers,
Gavin
This is addressed as of version 1.6.32, which is updated in conda 👍
Thanks again, Gavin!
That must be a record! Thanks for the super fast support
Hi there,
Thanks for making this extremely useful and well-documented tool!
I noticed that a genome I happened to know was very high quality based on checkm was dropped by GToTree due to no SCGs being identified. I looked into further and it was because this error occurred during the hmmsearch stage:
This was with this assembly: https://www.ncbi.nlm.nih.gov/assembly/GCF_001865985.1 (and two other test E. coli genomes with the default Bacteria HMMs, and all default parameters).
There may not be anything to do on the GToTree side of things, as this seems to be internal to hmmsearch, but I just thought I would mention it here in case others run into the same error. One thing to note is that it took me a while to figure out what the problem was, and I would have just assumed that genome was very incomplete if I hadn't known otherwise, which could mean that on rare occasions genomes are left out of the final phylogeny due to this problem.
All the best,
Gavin