ImagingDataCommons / ETL

(CORE REPO)
Apache License 2.0
0 stars 1 forks source link

[clinical] CPTAC clinical metadata #21

Closed fedorov closed 2 years ago

fedorov commented 2 years ago

I understand there is CPTAC clinical metadata in ISB-CGC that matches our images. @G-White-ISB can you please investigate how it is organized, how it is versioned, how it can be linked with images, so we can discuss how to make it available to the users?

G-White-ISB commented 2 years ago

I've already been talking to ISB-CGC folks about this. I have the source code

G-White-ISB commented 2 years ago

CGC pulled clinical data for CPTAC-3 from both the PDC and GDC, and put data from these pulls in the Big Query tables isb-cgc-bq.CPTAC.clinical_CPTAC3_discovery_pdc_current and isb-cgc-bq.CPTAC.clincal_gdc_current(https://console.cloud.google.com/bigquery?p=isb-cgc-bq&d=CPTAC&t=clinical_gdc_current). There is a difference in format and content in the tables. I have not looked to see if there are discrepancies. One issue is that the case_id column in these tables actually has the GCG UUID not the case_id.

G-White-ISB commented 2 years ago

Bill L. recommends using the gdc-sourced table. Also the case_id does appear in the table as the submitter_id. However, TCIA and IDC currently include CPTAC-3 collections that are not in GDC and not in the ISB-CGC tables. Also, CPTAC-3 now has a very simple API for pulling clinical data https://clinicalapi-cptac.esacinc.com/api/tcia/.

fedorov commented 2 years ago

CGC pulled clinical data for CPTAC-3 from both the PDC and GDC, and put data from these pulls in the Big Query tables [...] CPTAC-3 collections that are not in GDC and not in the ISB-CGC tables

I do see CPTAC3 in the ISB-CGC portal. I need help reconciling the two statements above.

G-White-ISB commented 2 years ago

ISB-CGC has lots of data in Big Query, including CPTAC3 data, that is not in their data explorer app:

https://isb-cgc.appspot.com/bq_meta_search/

fedorov commented 2 years ago

I am still confused. If CPTAC3 is in ISB-CGC, why can't we use those CPTAC3 tables in IDC?

G-White-ISB commented 2 years ago

We can but 'TCIA and IDC currently include CPTAC-3 collections that are not ...in the ISB-CGC tables'. Also for an external table we'd need to map the table columns to the correct DICOM patientID and provide this mapping to the users. For the ISB-CGC big query tables the column submitter_id contains the DICOM patientID. I need another column in the meta tables to explain this.

fedorov commented 2 years ago

I see, I missed that - some of the CPTAC-3 collections are not in the ISB-CGC tables. Why would that be the case - does @wlongabaugh have any idea?

wlongabaugh commented 2 years ago

CPTAC-3 clinical tables at ISB-CGC are pulled from the GDC clinical data API (see the "CPTAC" program) and the PDC clinical data API (see the "CPTAC-3" program). Note the GDC lumps CPTAC-2 and CPTAC-3 as separate projects under the CPTAC program. If a case does not show up there, then their API does not provide it.

fedorov commented 2 years ago

Ok, we should check if it exists anywhere else. Another possibility is that if those tables do not have clinical data for a certain collection, that clinical data might not exist.

G-White-ISB commented 2 years ago

I'm meeting Fabian Seidl tomorrow who is gathering CPTAC 3 data for ISB-CGC. I'll see what he knows.

fedorov commented 2 years ago

Per discussion today

G-White-ISB commented 2 years ago

I believe we can close this issue