Closed ASLeonard closed 1 year ago
Hi @ASLeonard,
Thanks for reporting this issue. Most likely, all the problems with grep, etc. stem from the "Unexpected number of columns in the header" error. Can you please share a couple of proteins that produce this error? I'll patch the miniprot boundary scorer to fix this.
Thanks, Tomas
In any case, PAF specification only mandates 12 columns in the header, so I changed the code in https://github.com/tomasbruna/miniprot-boundary-scorer/commit/25b92407b0f3b8035743d8009be826c7d92c0432 to check for that.
However, the fact that your output did not have 19 fields could be a sign of something else going wrong, so I'd still be interested in seeing your input. Thanks!
I think it was primarily the 12 column output, which seems like the default for unmapped proteins (since the -u
flag is used). The other issue for the start_codons seems to have gone away after deleting all files and downloading a clean set of proteins, so may have just been a partially corrupted file somewhere.
Right, I forgot GALBA is using the -u
flag. The alignment headers without subsequent alignments might be causing issues, let me fix that in the parser. Thanks!
Fixed in https://github.com/tomasbruna/miniprot-boundary-scorer/commit/c4d300a53203ebf0aa1e8a60680a1e368f58dac9. Again, thanks for reporting.
Hi, I've been unsuccessful in running GALBA (using earlier versions and v1.0.3). Steps seem to go okay until after the
miniprot --aln
, at which point I get lots of errors likeThis is coming from miniprot-boundary-scorer (cc @tomasbruna), because the alignment file lines are not 19 columns (here). If this happens so regularly and is an allowed output of miniprot, maybe the errors should be suppressed in GALBA, as it doesn't seem to be an "error"?
However, it did seem to hit a real error during miniprothint when getting start codons. I guess the file was empty after grepping
I replaced the
sys.exit
call with a pass here and the command would finish (although the hc.gff file only had 323 lines, is that expected for a 3gb mammal genome?).Thanks, Alex