alexsweeten / snacc

Normalized Compression Distance for Inferring Microbial Phylogenies
8 stars 4 forks source link

Investigate GenCompress #8

Closed alexsweeten closed 6 years ago

alexsweeten commented 6 years ago

GenCompress is a compression algorithm specifically designed for biological sequence data. It would be great to get a working implementation into our pipeline for testing.

wilcas commented 6 years ago

update: found a windows executable from the original authors as well as a linux exexutable here. It appears to require a reference genome, but the windows exec does not. Reading into the paper to get input/outputs and see if this is useful for us UPDATE: reference is optional, paper doesn't mention the use of a reference.

wilcas commented 6 years ago

@SweetiePi the binary they have expects a fasta format, should I investigate re-implementing the algorithm

wilcas commented 6 years ago

I wrote a function to take in fasta file path and output a file object