Closed fedorov closed 2 years ago
I've already been talking to ISB-CGC folks about this. I have the source code
CGC pulled clinical data for CPTAC-3 from both the PDC and GDC, and put data from these pulls in the Big Query tables isb-cgc-bq.CPTAC.clinical_CPTAC3_discovery_pdc_current and isb-cgc-bq.CPTAC.clincal_gdc_current(https://console.cloud.google.com/bigquery?p=isb-cgc-bq&d=CPTAC&t=clinical_gdc_current). There is a difference in format and content in the tables. I have not looked to see if there are discrepancies. One issue is that the case_id column in these tables actually has the GCG UUID not the case_id.
Bill L. recommends using the gdc-sourced table. Also the case_id does appear in the table as the submitter_id. However, TCIA and IDC currently include CPTAC-3 collections that are not in GDC and not in the ISB-CGC tables. Also, CPTAC-3 now has a very simple API for pulling clinical data https://clinicalapi-cptac.esacinc.com/api/tcia/.
CGC pulled clinical data for CPTAC-3 from both the PDC and GDC, and put data from these pulls in the Big Query tables [...] CPTAC-3 collections that are not in GDC and not in the ISB-CGC tables
I do see CPTAC3 in the ISB-CGC portal. I need help reconciling the two statements above.
ISB-CGC has lots of data in Big Query, including CPTAC3 data, that is not in their data explorer app:
I am still confused. If CPTAC3 is in ISB-CGC, why can't we use those CPTAC3 tables in IDC?
We can but 'TCIA and IDC currently include CPTAC-3 collections that are not ...in the ISB-CGC tables'. Also for an external table we'd need to map the table columns to the correct DICOM patientID and provide this mapping to the users. For the ISB-CGC big query tables the column submitter_id contains the DICOM patientID. I need another column in the meta tables to explain this.
I see, I missed that - some of the CPTAC-3 collections are not in the ISB-CGC tables. Why would that be the case - does @wlongabaugh have any idea?
CPTAC-3 clinical tables at ISB-CGC are pulled from the GDC clinical data API (see the "CPTAC" program) and the PDC clinical data API (see the "CPTAC-3" program). Note the GDC lumps CPTAC-2 and CPTAC-3 as separate projects under the CPTAC program. If a case does not show up there, then their API does not provide it.
Ok, we should check if it exists anywhere else. Another possibility is that if those tables do not have clinical data for a certain collection, that clinical data might not exist.
I'm meeting Fabian Seidl tomorrow who is gathering CPTAC 3 data for ISB-CGC. I'll see what he knows.
Per discussion today
I believe we can close this issue
I understand there is CPTAC clinical metadata in ISB-CGC that matches our images. @G-White-ISB can you please investigate how it is organized, how it is versioned, how it can be linked with images, so we can discuss how to make it available to the users?