NicolaDM / MAPLE

MAPLE - a new approximate approach for maximum likelihood phylogenetics at short divergence.
GNU General Public License v3.0
43 stars 9 forks source link

Extension for Maple file format: include reference #7

Closed corneliusroemer closed 1 year ago

corneliusroemer commented 2 years ago

I like the proposed maple format. It compresses better than .gz while remaining human readable. It could serve as efficient alternative input for tools like github.com/lenaschimmel/sc2rf

One thing missing to make it a lossless compression format is that the reference is apparently not explicitly included in the Maple file.

I would propose that the reference be included as the first sequence by default. One would need to find a magic name for it that doesn't conflict with any potential sequence names.

I could well imagine using the maple format as output of aligners like Nextalign. But not without inclusion of the reference in the file itself.

Would be fun to write a CLI utility like xz to compress/uncompress to maple. That would help adoption.

NicolaDM commented 2 years ago

Indeed I hope that a format like this will become popular in genomic epidemiology!

I agree having the reference and the samples in the same file makes usually things easier - I'll think about it and ask around, but I'll probably do it!

NicolaDM commented 1 year ago

Now both formats are allowed, with the reference in the input file, or in a separate file.