OndrejSladky / kmercamel

KmerCamel🐫 provides implementations of several algorithms for efficiently representing a set of k-mers as a masked superstring.
MIT License
11 stars 2 forks source link

Optimizing the number of runs approximately without running the ILP #66

Closed PavelVesely closed 7 months ago

PavelVesely commented 7 months ago

For very large datasets (e.g., the human genome), solving the ILP to minize the number of runs of 1s may be too time or memory consuming. I suggest to provide a heuristic version that only runs HeuristicPreSolve and then sets 1s in all other intervals, on which we now apply the ILP. We can call it runsapprox or something like that

OndrejSladky commented 7 months ago

Good idea. Depending on how large the ILP is we could also provide some standard approx. alg. for Set Cover like greedy/primal-dual alg. But if the ILP is small, this is probably not relevant.

OndrejSladky commented 7 months ago

If we find, we want the approx ilp solver, we can reopen this issue.

PavelVesely commented 7 months ago

I think it's not worth trying to implement a good approximation algorithm for set cover, at least not now. In the end, it may require more resources than a good ILP solver on the data that we have now. (Actually, we tried just glpk and not something better, right?)