UCLouvain-CBIO / depmap-workflow

F1000research workflow
0 stars 0 forks source link

Datasets overview #5

Closed lgatto closed 4 years ago

lgatto commented 4 years ago

It would be useful to have a table (or list) with all the datasets, a short description and their dimensions.

tfkillian commented 4 years ago

A table of all current depmap package datasets has been added at the end of the Introduction section

lgatto commented 4 years ago

I didn't mean a long and exhaustive list of all packages and all versions, but rather a table that informs readers of the (types of) data they can get in the latest release. Something like:

Dataset Description Dimensions Coverage SourceVersion
crispr_20Q1 (CERES) Batch and off-target corrected CRISPR-Cas9 gene knockdout dependency data 18333 genes, 739 cell lines 29 primary diseases and 26 lineages Feb 20 2019
copyNumber_20Q1 WES log copy number data 27639 genes, 1713 cell lines 35 primary diseases and 36 lineages Feb 20 2019
TPM_20Q1 CCLE TPM RNAseq gene expression data for protein coding genes 19144 genes, 1270 cancer cell lines 32 primary diseases and 34 lineages Feb 20 2019
mutationCalls_20Q1 Merged mutation calls (for coding region, germline filtered) and includes data 18802 genes, 1697 cell lines 35 primary diseases and 36 lineages Feb 20 2019
metadata_20Q1 Metadata for cell lines in the 20Q1 DepMap release 1775 cell lines 35 primary diseases and 37 lineages Feb 20 2019

which I can generate from the csv file. But

tfkillian commented 4 years ago

I have pushed at update to the paper so that the table has been condensed to reflect the most recently available datasets for each type (similar to that shown above, but also showing RPPA_19Q3, rnai_19Q3 and drug_sensitivity_19Q3)

I did not create the tables from a script, I created them in Excel by manually scraping the content from the wed and collating our existing metadata files for each release. I have also corrected the erroneous Feb 2019 release date.

lgatto commented 4 years ago

Thanks.