SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
89 stars 29 forks source link

Scoary crashes on PIRATE converted output file due to "--include_input_columns ALL" option #4

Closed dutchscientist closed 5 years ago

dutchscientist commented 5 years ago

Looks like a really interesting tool. As I have used the Roary/Scoary couple so far, I like the conversion to a Roary output.

I have tested this, and when the "--include_input_columns ALL" option is selected with Scoary, which allows the gene number columns to be copied to the output, then Scoary crashes on the converted file. Somehow the converted file is not fully compatible?

When the option is left out, then Scoary works fine, but would prefer to have the gene numbers to be included in the output. Any thoughts on what the subtle difference in format of Roary and converted files may be?

SionBayliss commented 5 years ago

I will play with this later today and let you know what the issue might be.

SionBayliss commented 5 years ago

The issue seems to be with the "--" motif in the first column of the converted PIRATE file. I think it is being considered a column delimited by scoary. Replacing the -- with another character (e.g. two underscores) fixes the issue. I will push a fix to the current development branch (paralog_correction). Be aware that if you do use this branch that it is under active development and may act a little wonky in some edge cases. I hope that helps!

dutchscientist commented 5 years ago

Thanks, will have a look soon.