dib-lab / 2020-paper-sourmash-gather

Here we describe an extension of MinHash that permits accurate compositional analysis of metagenomes with low memory and disk requirements.
https://dib-lab.github.io/2020-paper-sourmash-gather
Other
8 stars 1 forks source link

min-set-cover references for genomics specifically #16

Open ctb opened 3 years ago

ctb commented 3 years ago

collecting references and annotations --

  1. VarCover: Allele Min-Set Cover Software

To facilitate reference-material selection for clinical genetic testing laboratories, we developed VarCover, open-source software hosted on GitHub, which accepts a file of variants and returns an approximately minimum set (min-set) of samples covering the targeted alleles. ... As a test case, we attempted to find a min-set of reference samples from the 1000 Genomes Project to cover 237 variants considered putatively pathogenic (of which 12 were classified as pathogenic or likely pathogenic) in the original 56 medically actionable genes recommended by the American College of Medical Genetics and Genomics (ACMG). ... VarCover provides a simple programmatic interface for identifying an approximately min-set of reference samples, thereby reducing clinical laboratory effort and molecular genetic test–validation costs.

  1. Minimum Interval Cover and Its Application to Genome Sequencing

Pairwise end sequencing is a very useful method for whole genome sequencing which determines the complete DNA sequence of an organism’s genome with the help with laboratory processes. Paired-end interval cover problem is derived from pairwise end sequencing. A paired-end interval for a sequence S is constituted of at most two disjoint intervals, and the paired-end interval cover problem can be described as given a family 𝔽 of paired-end intervals, find the least number of paired-end intervals of 𝔽 to cover S.

  1. Set cover-based methods for motif selection

The motif selection problem seeks to identify a minimal set of putative regulatory motifs that characterize sequences of interest (e.g. ChIP-Seq binding regions). ... In this study, the motif selection problem is mapped to variants of the set cover problem that are solved via tabu search and by relaxed integer linear programing (RILP).

ctb commented 3 years ago
  1. (An exact algorithm for finding cancer driver somatic genome alterations: the weighted mutually exclusive maximum set cover problem](https://almob.biomedcentral.com/articles/10.1186/s13015-016-0073-9)

The mutual exclusivity of somatic genome alterations (SGAs), such as somatic mutations and copy number alterations, is an important observation of tumors and is widely used to search for cancer signaling pathways or SGAs related to tumor development. ... In this study, we propose a novel signal-based method that utilizes the intrinsic relationship between SGAs on signaling pathways and expression changes of downstream genes regulated by pathways to identify cancer signaling pathways using the mutually exclusive property.

ctb commented 3 years ago
  1. Using set theory to reduce redundancy in pathway sets

The consolidation of pathway databases, such as KEGG, Reactome and ConsensusPathDB, has generated widespread biological interest, however the issue of pathway redundancy impedes the use of these consolidated datasets. ... We propose a method that uses set cover to reduce pathway redundancy, without merging pathways. The proposed approach considers three objectives: removal of pathway redundancy, controlling pathway size and coverage of the gene set.