Open brankaj opened 8 years ago
Nice. Let's also include DOI links when available. For Dudoit et al, the link is https://doi.org/10.1198/016214502753479248. DOIs provide persistent identifiers and make metadata lookup much easier.
Just came across DoCM: a database of curated mutations in cancer:
Large-scale cancer genomics discovery projects such as The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) have systematically characterized the molecular lesions in human cancer genomes, thereby laying the foundation for precision cancer medicine. However, a curated set of somatic variants with established relevance to cancer biology is essential for clinical annotation and for use in computational data analysis. We have created a database of curated mutations in cancer (DoCM, http://docm.info), an open-source, openly licensed resource to enable the cancer research community to aggregate, store, and track biologically important cancer variants with provenance supported by the literature.
Paper on mult-label machine learning algorithms: http://cse.seu.edu.cn/people/zhangml/files/TKDE'13.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.466.1304&rep=rep1&type=pdf another paper on multi-target classification
some code on multi-label classification: https://github.com/davidwarshaw/hmc
There are several research papers and webpages that can be helpful to the cognoma task. Maybe I missed it, but I am not aware of any other issue that addresses this question. It would be good to have a place where members can post papers that could benefit the community. Since not everyone has the academic access to databases, it is preferable that papers posted are open-access.
I recently found this paper - Dudoit, Sandrine, Jane Fridlyand, and Terence P. Speed. "Comparison of discrimination methods for the classification of tumors using gene expression data." http://www.stat.cmu.edu/~jiashun/Research/software/GenomicsData/papers/dudoit.pdf
It is published in 2002 so their dataset is way smaller. However, it contains some useful information regarding the data processing and gene datasets in general. It was a good read even though I am not in this field.