ImagingDataCommons / IDC-WebApp

Web Application front end for IDC (CORE REPO)
Apache License 2.0
6 stars 2 forks source link

Investigate limited access items that appear to have CC-BY license #933

Open fedorov opened 2 years ago

fedorov commented 2 years ago

My expectation is that there should be no Limited access items that are distributed under CC-BY license, the below is not empty, and seems to be wrong.

image
s-paquette commented 2 years ago
SELECT collection_id, license_short_name
FROM `idc-dev-etl.idc_v8_pub.dicom_all`
WHERE access='Limited'
GROUP BY collection_id, license_short_name
bcli4d commented 2 years ago

These instances are from the GBM-MR-NER-Outcomes analysis results collection, which has a CC BY 3.0 license. However, the instance data is mixed in with TCGA-GBM data and is therefore not publicly available.

bcli4d commented 2 years ago

There are 147 series in TCGA-GBM with GBM-MR-NER-Outcomes' source_doi: SELECT distinct tcia_api_collection_id, SeriesInstanceUID FROMidc-dev-etl.idc_v8_pub.auxiliary_metadatawhere source_doi = '10.7937/K9/TCIA.2014.FAB7YRPZ'

bcli4d commented 2 years ago

@fedorov As requested, ETL Workflow, v9 describes how DOIs are assigned to series, specifically, in sections 3.2.3.2 and 3.2.3.5. The corresponding code is here and here respectively.

fedorov commented 2 years ago

ticket submitted to TCIA https://help.cancerimagingarchive.net/servicedesk/customer/portal/1/TH-49543

bcli4d commented 2 years ago

@andrey, I seem to recall that you wanted to investigate this issue further before we proceed to restore the DOI of these 147 series to that of the analysis result? (I previously changed the DOI to that of TCGA-GBM)