dib-lab / 2020-paper-sourmash-gather

Here we describe an extension of MinHash that permits accurate compositional analysis of metagenomes with low memory and disk requirements.
https://dib-lab.github.io/2020-paper-sourmash-gather
Other
8 stars 1 forks source link

figure out the right reference for min-set-cov greedy algorithm #29

Open ctb opened 3 years ago

ctb commented 3 years ago

per https://en.wikipedia.org/wiki/Set_cover_problem#Greedy_algorithm

"Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms (see Inapproximability results below), under plausible complexity assumptions."

what should we cite here, and how should we explain it? The original reference is A Greedy Heuristic for the Set-Covering Problem , and then the inapproximability results extend that - https://en.wikipedia.org/wiki/Set_cover_problem#Inapproximability_results

My current hot take is that we can say, "We implement a greedy algorithm that gives us a good approximation", and cite the 1979 paper, because the algorithm we use is that one; the other references merely explore and/or tighten the bounds on the approximation.