deepchem / deepbio

Deep Learning tools For Biology
MIT License
10 stars 0 forks source link

Datasets for deepbio #2

Open arunppsg opened 2 years ago

arunppsg commented 2 years ago

Hi folks, this issue is to brainstorm what kind of datasets can be useful to deepbio projects. Some good starting points will be datasets on genomics, proteins but this broad. More specific datasets and their use cases will be helpful.

JoseAntonioSiguenza commented 2 years ago

Hi! I've been looking at some popular datasets that join the scope of biology and deep learning. The main topics are genomics, epigenomics, proteins, and clinical & healthcare. Here is the reference repo of the datasets. Fascinating projects gather data such as 1000 Genomes, Encode Project, or NCBI Proteins. I'm excited to receive comments about these topics and datasets. Finally, which topics can we include in this brainstorming, to start discussing and looking for data?

paupaiz commented 2 years ago

Data from Bone Marrow Mononuclear Cells (BMMCs) would be a great candidate due to the reasons outlined in this video (time stamp 14:20) outlined below and in the picture attached:

  1. The system is well understood
  2. We know cell types present in the bone marrow and what surface markers are expressed
  3. Mixture/Variety of different cell types (both spectrum and clusters)
  4. Commercially available Screen Shot 2022-06-10 at 3 54 26 PM

    The data is also openly available here .