Closed annapamma closed 3 years ago
Assigned to myself - some of these cancers require a password that I do not have to download, but I think some should be public. I will fix.
At the time of this comment, the CPTAC dataset availability is as follows:
Dataset name | Description | Data reuse status | Publication link |
---|---|---|---|
Brca | breast cancer | no restrictions | https://pubmed.ncbi.nlm.nih.gov/33212010/ |
Ccrcc | clear cell renal cell carcinoma (kidney) | no restrictions | https://pubmed.ncbi.nlm.nih.gov/31675502/ |
Colon | colorectal cancer | no restrictions | https://pubmed.ncbi.nlm.nih.gov/31031003/ |
Endometrial | endometrial carcinoma (uterine) | no restrictions | https://pubmed.ncbi.nlm.nih.gov/32059776/ |
**Gbm | glioblastoma | password access only | unpublished** |
Hnscc | head and neck squamous cell carcinoma | no restrictions | https://pubmed.ncbi.nlm.nih.gov/33417831/ |
**Lscc | lung squamous cell carcinoma | password access only | unpublished** |
Luad | lung adenocarcinoma | no restrictions | https://pubmed.ncbi.nlm.nih.gov/32649874/ |
Ovarian | high grade serous ovarian cancer | no restrictions | https://pubmed.ncbi.nlm.nih.gov/27372738/ |
**Pdac | pancreatic ductal adenocarcinoma | password access only | unpublished** |
As such, datasets have been updated to following (added hnscc): ['brca', 'ccrcc', 'endometrial', 'colon', 'ovarian', 'hnscc', 'luad']
Awesome, i cut and paste this and put it into the READMEs (main, mRNAdata, protData). Still testing them all with all the algorithms.
Reopening because HNSCC is failing on Circle (although it works locally with CWL). I think this is an issue with permissions in the virtual environment.
Going to troubleshoot.
Interesting I think this works for me as well. I wonder if it's a Docker image/build issue?
I think I found it - hnscc wasn't added to the getAllDatasets.py in the mrna folder, just the protein folder. Testing now in pr #105. If it wasn't that, it was an indexing issue (metadata selected patients that had no transcriptomics). That has also been fixed.
It appears there's a discrepancy between the datasets that have been loaded onto the Docker image and what can be saved to file.
I noticed that protDataSetsCLI.py can accept the following values for cancerType:
brca ccrcc colon ovarian endometrial gbm hnscc lscc luad
However, getAllDatasets.py only installs the data for ['brca', 'ccrcc', 'endometrial', 'colon', 'ovarian', 'luad'].
It looks like this list should be updated to: ['brca', 'ccrcc', 'endometrial', 'colon', 'ovarian', 'luad', 'gbm', 'hnscc', 'lscc']