I was trying to obtain STAR gene expression data from multiple datasets including TCGA, TARGET, and CPTAC. Some errors happened when preparing some projects, and they are mainly related to duplicated records, especially in the CPTAC-3. I tested the code and made a workaround. This PR also amended some data checks to make sure the data preparation process went smoothly.
o Preparing output
-------------------
Downloading data for project CPTAC-3
Of the 2340 files for download 2340 already exist.
All samples have been already downloaded
Removing duplicated cases (with older updated time)
=> 41 records removed
I was trying to obtain STAR gene expression data from multiple datasets including TCGA, TARGET, and CPTAC. Some errors happened when preparing some projects, and they are mainly related to duplicated records, especially in the
CPTAC-3
. I tested the code and made a workaround. This PR also amended some data checks to make sure the data preparation process went smoothly.Code to query the data.