cobilab / jarvis3

Efficient compression of biological data
GNU General Public License v3.0
4 stars 0 forks source link

How to compare JARVIS 3 and GeCo3 #5

Closed karel-brinda closed 1 week ago

karel-brinda commented 2 weeks ago

Hello,

I'm currently experimenting with both JARVIS3 and GeCo3.

Do you have any advice how to make the results comparable? Specifically, both programs look conceptually similar to me, with JARVIS3 probably leading to better results.

However, the level parameters are completely different across the two programs.

For instance, is there a chance to come with an analogy of --best and --fast in XZ and BZIP2?

Thanks a lot!

pratas commented 1 week ago

Hi Karel,

These are the main conceptual differences: GeCo3 allows reference-based compression (Conditional and referential). GeCo3 provides many applications for sequence analysis. JARVIS3 allows only reference-free compression. JARVIS3 provides a script code for applying it to FASTA and FASTQ data. JARVIS3 includes repeat models (or copy models).

In Reference-free compression JARVIS3 provides better results than GeCo3.

Unfortunately, there isn't an easy way to perform that because the number of parameters is very high and that's why we provide different levels.

What we are currently doing to solve this issue? We are developing a software to estimate a good set of parameters to JARVIS3 for a given sequence using evolutionary models (See, for example https://github.com/cobilab/OptimJV3). This includes multi objective optimization (compression ratio and computational resources) for example to achieve --best and --fast parameters.

Therefore, this is something that will be accessible as soon as we complete this project.

Cheers