ImagingDataCommons / TCIA-IDC-Coordination

1 stars 1 forks source link

Scraping analysis results DICOM manifests #10

Closed bcli4d closed 3 years ago

bcli4d commented 4 years ago

@bcli4d In the absence of some direct method for identifying derived images, I am working on the following workaround to tabulate 3rd party data in each data collection:

  1. Create a (BQ) table, 'thirdparty', of all 3rd party instances from a manifest of such 3rd party series downloaded from the TCIA search portal
  2. For each analysis results collection, create a table 'analysis_dicoms' of all DICOM series in all manifests for that collection. I believe that I need to create a table that comprises data from all manifests in an analysis results collection because it is not clear whether a manifest contains original or derived data or both.
  3. Given the above, construct a third table for each analysis results collection of all series that are in both 'thirdparty' and the 'analysis_dicoms' table of that collection.

My actual question :-) : To build the 'analysis_dicoms' table, I am scraping the table at the bottom of each of the analysis results pages. In general, a DICOM manifest has a data-linked-resource-content-type attribute of application/x-nbia-manifest-file. However, some manifests have a data-linked-resource-content-type attribute of application/octet-stream. These are the four SR manifests of "DICOM SR of clinical data and measurement for breast cancer collections to TCIA", and the Images manifest of "Integration of CT-based Qualitative and Radiomic Features with Proteomic Variables in Patients with High-Grade Serous Ovarian Cancer: An Exploratory Analysis". Is there any reason why these are not also application/x-nbia-manifest-file. Are these some different kind of manifest?

kirbyju commented 4 years ago

Bill,

Bill, did you make any sort of formal request to the UAMS PRISM team to add APIs to let you query this in Bindaas? If so, how long ago?

Regardless of that, there has to be some way to use the NBIA database to discern which DICOM series are analysis results and which are not. Martin's NBIA Client code is using internal NBIA APIs to do this in order to power the Simple Search filter for 3rd Party Analyses in the user interface (look at the radio button for this in left side menu at https://nbia.cancerimagingarchive.net/nbia-search/). Note that in addition to tracking whether a series is an analysis result or not, we also track the DOI for the dataset in the NBIA database. This is useful for letting people click the DOI link and go read a summary about how the analysis data were generated, who created it, etc.

Ulli, Scott, is there some way the IDC folks can make the relevant API calls like Martin is doing?

Thanks, Justin

Justin Kirby (contractor) Technical Project Manager, Frederick National Laboratory for Cancer Research Technical Director, Cancer Imaging Informatics Lab ORCiD: https://orcid.org/0000-0003-3487-8922 240-276-6016 justin.kirby@nih.govmailto:kirbyju@mail.nih.gov


From: Bill Clifford notifications@github.com Sent: Tuesday, July 28, 2020 1:55 AM To: ImagingDataCommons/TCIA-IDC-Coordination TCIA-IDC-Coordination@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

@bcli4dhttps://github.com/bcli4d In the absence of some direct method for identifying derived images, I am working on the following workaround to tabulate 3rd party data in each data collection:

  1. Create a (BQ) table, 'thirdparty', of all 3rd party instances from a manifest of such 3rd party series downloaded from the TCIA search portal
  2. For each analysis results collection, create a table 'analysis_dicoms' of all DICOM series in all manifests for that collection. I believe that I need to create a table that comprises data from all manifests in an analysis results collection because it is not clear whether a manifest contains original or derived data or both.
  3. Given the above, construct a third table for each analysis results collection of all series that are in both 'thirdparty' and the 'analysis_dicoms' table of that collection.

My actual question :-) : To build the 'analysis_dicoms' table, I am scraping the table at the bottom of each of the analysis results pages. In general, a DICOM manifest has a data-linked-resource-content-type attribute of application/x-nbia-manifest-file. However, some manifests have a data-linked-resource-content-type attribute of application/octet-stream. These are the four SR manifests of "DICOM SR of clinical data and measurement for breast cancer collections to TCIA", and the Images manifest of "Integration of CT-based Qualitative and Radiomic Features with Proteomic Variables in Patients with High-Grade Serous Ovarian Cancer: An Exploratory Analysis". Is there any reason why these are not also application/x-nbia-manifest-file. Are these some different kind of manifest?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASE6B6YKINLPN6KKUQTDS3R5ZR55ANCNFSM4PKE3M6A.

kirbyju commented 4 years ago

Hi Justin,

I am attaching a word document because the APIs that are current for TCIA are kept on a private wiki page, as there are more features available at TCIA than in the community edition and the NBIA APIs are not advertised for TCIA. The document also includes an example of how to use the API to get this type of information. Note, that the APIs are focused on efficiency to reduce response times for the client. The basic flow is:

  1. Request an OAuth token.

  2. Perform a search for third party analysis.

  3. Drill down to the series details.

As far as manifests go, you can create a Data Retriever manifest with a text file that lists the seriesUIDs that matches below, you can ignore the databasketId as it is legacy.

downloadServerUrl=https://public.cancerimagingarchive.net/nbia-download/servlet/DownloadServlet includeAnnotation=true noOfrRetry=4 databasketId=manifest-1595503879119.tcia manifestVersion=3.0 ListOfSeriesToDownload= 1.3.6.1.4.1.14519.5.2.1.6919.4624.319956314021047038498210610134 1.2.276.0.7230010.3.1.3.0.74366.1588583084.764537 1.3.6.1.4.1.14519.5.2.1.6919.4624.313514201353787659031503464798 1.3.6.1.4.1.14519.5.2.1.6919.4624.113493579075669637574394466994 1.3.6.1.4.1.14519.5.2.1.6919.4624.241474384128770482476403302453 1.2.276.0.7230010.3.1.3.0.74416.1588583149.544409 1.3.6.1.4.1.14519.5.2.1.6919.4624.190836162611840065031514192922 1.3.6.1.4.1.14519.5.2.1.6919.4624.277870064395307630151805987637

Thanks, Scott From: Kirby, Justin (NIH/NCI) [C] [mailto:kirbyju@mail.nih.gov] Sent: Tuesday, July 28, 2020 9:14 AM To: ImagingDataCommons/TCIA-IDC-Coordination; ImagingDataCommons/TCIA-IDC-Coordination Cc: Subscribed; Wagner, Ulrike (NIH/NCI) [C]; Scott Gustafson Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

Bill,

Bill, did you make any sort of formal request to the UAMS PRISM team to add APIs to let you query this in Bindaas? If so, how long ago?

Regardless of that, there has to be some way to use the NBIA database to discern which DICOM series are analysis results and which are not. Martin's NBIA Client code is using internal NBIA APIs to do this in order to power the Simple Search filter for 3rd Party Analyses in the user interface (look at the radio button for this in left side menu at https://nbia.cancerimagingarchive.net/nbia-search/). Note that in addition to tracking whether a series is an analysis result or not, we also track the DOI for the dataset in the NBIA database. This is useful for letting people click the DOI link and go read a summary about how the analysis data were generated, who created it, etc.

Ulli, Scott, is there some way the IDC folks can make the relevant API calls like Martin is doing?

Thanks, Justin

Justin Kirby (contractor) Technical Project Manager, Frederick National Laboratory for Cancer Research Technical Director, Cancer Imaging Informatics Lab ORCiD: https://orcid.org/0000-0003-3487-8922 240-276-6016 justin.kirby@nih.govmailto:kirbyju@mail.nih.gov


From: Bill Clifford notifications@github.com Sent: Tuesday, July 28, 2020 1:55 AM To: ImagingDataCommons/TCIA-IDC-Coordination TCIA-IDC-Coordination@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

@bcli4dhttps://github.com/bcli4d In the absence of some direct method for identifying derived images, I am working on the following workaround to tabulate 3rd party data in each data collection:

  1. Create a (BQ) table, 'thirdparty', of all 3rd party instances from a manifest of such 3rd party series downloaded from the TCIA search portal
  2. For each analysis results collection, create a table 'analysis_dicoms' of all DICOM series in all manifests for that collection. I believe that I need to create a table that comprises data from all manifests in an analysis results collection because it is not clear whether a manifest contains original or derived data or both.
  3. Given the above, construct a third table for each analysis results collection of all series that are in both 'thirdparty' and the 'analysis_dicoms' table of that collection.

My actual question :-) : To build the 'analysis_dicoms' table, I am scraping the table at the bottom of each of the analysis results pages. In general, a DICOM manifest has a data-linked-resource-content-type attribute of application/x-nbia-manifest-file. However, some manifests have a data-linked-resource-content-type attribute of application/octet-stream. These are the four SR manifests of "DICOM SR of clinical data and measurement for breast cancer collections to TCIA", and the Images manifest of "Integration of CT-based Qualitative and Radiomic Features with Proteomic Variables in Patients with High-Grade Serous Ovarian Cancer: An Exploratory Analysis". Is there any reason why these are not also application/x-nbia-manifest-file. Are these some different kind of manifest?

- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASE6B6YKINLPN6KKUQTDS3R5ZR55ANCNFSM4PKE3M6A.

bcli4d commented 4 years ago

Justin, thanks for your quick response.

Regrading formal requests to UAMS PRISM, Andrey create the related issue #9, but I'm not aware of a formal request mechanism.

look at the radio button for this in left side menu at https://nbia.cancerimagingarchive.net/nbia-search/

That's the mechanism that I use to create the 'thirdparty' table described above.

we also track the DOI for the dataset in the NBIA database.

Yes, I start by scraping the TCIA Analysis Results page for those DOIs, and then scrape each of the linked summary pages for manifest URLs.

Scott, I did not find a word document attached. What you describe sounds promising.

Regards, Bill

bcli4d commented 4 years ago

Justin sent me Scott's word document. It looks like that API will work for me. Thanks!

bcli4d commented 4 years ago

Justin, Scott, I have not been be able to get 3rd party results for Breast Diagnosis using the /getSimpleSearchWithModalityAndBodyPartPaged API. I've found that I have to tinker with the exact spelling of some collections, but haven't found a variation that works for this one. Can you let me know what to use? Thanks, Bill

fedorov commented 4 years ago

Document from Justin mentioned earlier is here: https://docs.google.com/document/d/1kJp3fBInwgUqQUHBQW4MEa0ykXaCcgDYV7IYRsHhyGo/edit

fedorov commented 4 years ago

@kirbyju is there an endpoint that would allow mapping from SeriesInstanceUID to your internal "seriesIdentifier"?

kirbyju commented 4 years ago

I'm not sure what you're asking. We use SeriesInstanceUID as the primary way to identify a given series in TCIA/NBIA. I'm not sure what "seriesIdentifier" is or where you found a reference to that. If you let me know I can take a look.

Justin Kirby (contractor) Technical Project Manager, Frederick National Laboratory for Cancer Research Technical Director, Cancer Imaging Informatics Lab ORCiD: https://orcid.org/0000-0003-3487-8922 240-276-6016 justin.kirby@nih.govmailto:kirbyju@mail.nih.gov


From: Andrey Fedorov notifications@github.com Sent: Wednesday, July 29, 2020 2:56 PM To: ImagingDataCommons/TCIA-IDC-Coordination TCIA-IDC-Coordination@noreply.github.com Cc: Kirby, Justin (NIH/NCI) [C] kirbyju@mail.nih.gov; Mention mention@noreply.github.com Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

@kirbyjuhttps://github.com/kirbyju is there an endpoint that would allow mapping from SeriesInstanceUID to your internal "seriesIdentifier"?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-665841803, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASE6B6DRTPSLQMAU2C5VDDR6BWEDANCNFSM4PKE3M6A.

fedorov commented 4 years ago

@kirbyju based on the document shared by @sgustaf, which is available here, in order to drill down into the details for the individual series, one needs to first get "seriesIdentifiers" from the first call. See section Example steps 2-3.

kirbyju commented 4 years ago

I see. This question would be best addressed to Scott, directly. I had no idea they were using this separate field.

Justin Kirby (contractor) Technical Project Manager, Frederick National Laboratory for Cancer Research Technical Director, Cancer Imaging Informatics Lab ORCiD: https://orcid.org/0000-0003-3487-8922 240-276-6016 justin.kirby@nih.govmailto:kirbyju@mail.nih.gov


From: Andrey Fedorov notifications@github.com Sent: Wednesday, July 29, 2020 4:08 PM To: ImagingDataCommons/TCIA-IDC-Coordination TCIA-IDC-Coordination@noreply.github.com Cc: Kirby, Justin (NIH/NCI) [C] kirbyju@mail.nih.gov; Mention mention@noreply.github.com Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

@kirbyjuhttps://github.com/kirbyju based on the document shared by @sgustafhttps://github.com/sgustaf, which is available herehttps://docs.google.com/document/d/1kJp3fBInwgUqQUHBQW4MEa0ykXaCcgDYV7IYRsHhyGo/edit, in order to drill down into the details for the individual series, one needs to first get "seriesIdentifiers" from the first call. See section Example steps 2-3.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-665896115, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASE6B56HLDLPF6HBRCN6QTR6B6UZANCNFSM4PKE3M6A.

fedorov commented 4 years ago

That's what I thought, thanks Justin.

Ulli already asked me to add Scott to this repo, so I will wait for his response.

sgustaf commented 4 years ago

Hi,

The way the client works is to take the minimal information on the initial call in order to respond to the user as quickly as possible with a screen that shows the results at the patient level. Once the user is interested in a particular patient or group of patients the client then requests details from the server for that information that the user is interested in. We don’t send everything back with the initial call because it would slow response time and possibly overwhelm the client memory with all the information queried.

Thanks, Scott

From: Justin Kirby [mailto:notifications@github.com] Sent: Wednesday, July 29, 2020 4:18 PM To: ImagingDataCommons/TCIA-IDC-Coordination Cc: Scott Gustafson; Mention Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

I see. This question would be best addressed to Scott, directly. I had no idea they were using this separate field.

Justin Kirby (contractor) Technical Project Manager, Frederick National Laboratory for Cancer Research Technical Director, Cancer Imaging Informatics Lab ORCiD: https://orcid.org/0000-0003-3487-8922 240-276-6016 justin.kirby@nih.govmailto:kirbyju@mail.nih.gov


From: Andrey Fedorov notifications@github.com Sent: Wednesday, July 29, 2020 4:08 PM To: ImagingDataCommons/TCIA-IDC-Coordination TCIA-IDC-Coordination@noreply.github.com Cc: Kirby, Justin (NIH/NCI) [C] kirbyju@mail.nih.gov; Mention mention@noreply.github.com Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

@kirbyjuhttps://github.com/kirbyju based on the document shared by @sgustafhttps://github.com/sgustaf, which is available herehttps://docs.google.com/document/d/1kJp3fBInwgUqQUHBQW4MEa0ykXaCcgDYV7IYRsHhyGo/edit, in order to drill down into the details for the individual series, one needs to first get "seriesIdentifiers" from the first call. See section Example steps 2-3.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-665896115, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASE6B56HLDLPF6HBRCN6QTR6B6UZANCNFSM4PKE3M6A.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-665904226, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7G7HOYPHZUERKJ7A2YZCTR6B7Y7ANCNFSM4PKE3M6A.

fedorov commented 4 years ago

Scott, I don't understand the answer, sorry.

The question I asked is the following: is there an endpoint that would allow mapping from SeriesInstanceUID to your internal "seriesIdentifier"?

In IDC, we need to be able to attribute every item in our repository to the specific collection it belongs to. This is important to be compliant with the terms of use of the TCIA data, and to give credit to the contributors of the data. I think the direct way to do this would be for TCIA or NBIA to provide API that would take SeriesInstanceUID, and return DOI of the collection (either original, or analysis results collection) it belongs to.

sgustaf commented 4 years ago

Hi Andrey,

Unfortunately the github service doesn’t do a great job of presenting the ultimate question. getStudyDillDown would do that for a list of seriesIdentifers.

Drill down the studies and series, which contain the DOI (descriptionURI) and SeriesUIDs once again limiting internal series ids for brevity

curl -H "Authorization:Bearer 4ac3e4c9-8faf-42bf-9ba3-90277cd02133" -k "https://public.cancerimagingarchive.net/nbia-api/services/getStudyDrillDown" -d "list=207945728&list=207945729"

[ { "studyId":"1.2.840.113745.101000.1186002.40721.8111.13846623", "date":1437714000000, "description":"CT RESEARCH EXAM", "id":100728832, "study_id":"1", "seriesList":[ { "seriesNumber":"1000", "seriesUID":"1.2.276.0.7230010.3.1.3.0.32669.1424671831.707479", "numberImages":1, "modality":"SEG", "manufacturer":"3D Slicer Community", "annotationsFlag":false, "annotationsSize":0, "patientId":"4482356", "patientPkId":"100696064", "studyId":"1.2.840.113745.101000.1186002.40721.8111.13846623", "studyPkId":100728832, "totalSizeForAllImagesInSeries":7916510, "project":"Lung Phantom", "description":"QIN CT challenge:lesion11 alg03 run3 segmentation result", "dataProvenanceSiteName":null, "manufacturerModelName":null, "softwareVersion":null, "maxFrameCount":"237", "studyDate":null, "studyDesc":null, "bodyPartExamined":null, "study_id":null, "thirdPartyAnalysis":"YES", "descriptionURI":"http://doi.org/10.7937/K9/TCIA.2015.1BUVFJR7", "seriesId":"1.2.276.0.7230010.3.1.3.0.32669.1424671831.707479", "studyDateString":"", "exactSize":7916510, "seriesPkId":207945728 }, { "seriesNumber":"1000", "seriesUID":"1.2.276.0.7230010.3.1.3.0.33454.1424693226.234087", "numberImages":1, "modality":"SEG", "manufacturer":"3D Slicer Community", "annotationsFlag":false, "annotationsSize":0, "patientId":"4482356", "patientPkId":"100696064", "studyId":"1.2.840.113745.101000.1186002.40721.8111.13846623", "studyPkId":100728832, "totalSizeForAllImagesInSeries":7916508, "project":"Lung Phantom", "description":"QIN CT challenge:lesion4 alg02 run2 segmentation result", "dataProvenanceSiteName":null, "manufacturerModelName":null, "softwareVersion":null, "maxFrameCount":"237", "studyDate":null, "studyDesc":null, "bodyPartExamined":null, "study_id":null, "thirdPartyAnalysis":"YES", "descriptionURI":"http://doi.org/10.7937/K9/TCIA.2015.1BUVFJR7", "seriesId":"1.2.276.0.7230010.3.1.3.0.33454.1424693226.234087", "studyDateString":"", "exactSize":7916508, "seriesPkId":207945729 } ] } ]

There is a separate call to getStudyDrillDownWithSeriesIds which takes seriesUIDs that will give you the information for particular seriesUIDs:

curl -H "Authorization:Bearer 66017470-0c78-4fa5-a05a-e16526dd6747" -k "https://public.cancerimagingarchive.net/nbia-api/services/getStudyDrillDownWithSeriesIds" -d "list=1.3.6.1.4.1.14519.5.2.1.7695.4001.306204232344341694648035234440&list=1.3.6.1.4.1.14519.5.2.1.7695.4001.180700359927709468630440576839"

[ { "studyId":"1.3.6.1.4.1.14519.5.2.1.7695.4001.130563880911723253267280582465", "date":913096800000, "description":"MSTEALTH", "id":163840, "study_id":null, "seriesList":[ { "seriesNumber":"1", "seriesUID":"1.3.6.1.4.1.14519.5.2.1.7695.4001.180700359927709468630440576839", "numberImages":46, "modality":"MR", "manufacturer":"GE MEDICAL SYSTEMS", "annotationsFlag":false, "annotationsSize":0, "patientId":"TCGA-08-0244", "patientPkId":"131072", "studyId":"1.3.6.1.4.1.14519.5.2.1.7695.4001.130563880911723253267280582465", "studyPkId":163840, "totalSizeForAllImagesInSeries":6129768, "project":"TCGA-GBM", "description":"FMPSPGR SAG", "dataProvenanceSiteName":null, "manufacturerModelName":null, "softwareVersion":null, "maxFrameCount":"0", "studyDate":null, "studyDesc":null, "bodyPartExamined":"BRAIN", "study_id":null, "thirdPartyAnalysis":null, "descriptionURI":null, "seriesId":"1.3.6.1.4.1.14519.5.2.1.7695.4001.180700359927709468630440576839", "studyDateString":"", "seriesPkId":229377, "exactSize":6129768 }, { "seriesNumber":"2", "seriesUID":"1.3.6.1.4.1.14519.5.2.1.7695.4001.306204232344341694648035234440", "numberImages":124, "modality":"MR", "manufacturer":"GE MEDICAL SYSTEMS", "annotationsFlag":false, "annotationsSize":0, "patientId":"TCGA-08-0244", "patientPkId":"131072", "studyId":"1.3.6.1.4.1.14519.5.2.1.7695.4001.130563880911723253267280582465", "studyPkId":163840, "totalSizeForAllImagesInSeries":16524014, "project":"TCGA-GBM", "description":"3DSPGR AXIAL", "dataProvenanceSiteName":null, "manufacturerModelName":null, "softwareVersion":null, "maxFrameCount":"0", "studyDate":null, "studyDesc":null, "bodyPartExamined":"BRAIN", "study_id":null, "thirdPartyAnalysis":null, "descriptionURI":null, "seriesId":"1.3.6.1.4.1.14519.5.2.1.7695.4001.306204232344341694648035234440", "studyDateString":"", "seriesPkId":229376, "exactSize":16524014 } ] } ]

1

VALID (RFC 8259) Formatted JSON Data [ { "studyId":"1.3.6.1.4.1.14519.5.2.1.7695.4001.130563880911723253267280582465", "date":913096800000, "description":"MSTEALTH", "id":163840, "study_id":null, "seriesList":[ { "seriesNumber":"1", "seriesUID":"1.3.6.1.4.1.14519.5.2.1.7695.4001.180700359927709468630440576839", "numberImages":46, "modality":"MR", "manufacturer":"GE MEDICAL SYSTEMS", "annotationsFlag":false, "annotationsSize":0, "patientId":"TCGA-08-0244", "patientPkId":"131072", "studyId":"1.3.6.1.4.1.14519.5.2.1.7695.4001.130563880911723253267280582465", "studyPkId":163840, "totalSizeForAllImagesInSeries":6129768, "project":"TCGA-GBM", "description":"FMPSPGR SAG", "dataProvenanceSiteName":null, "manufacturerModelName":null, "softwareVersion":null, "maxFrameCount":"0", "studyDate":null, "studyDesc":null, "bodyPartExamined":"BRAIN", "study_id":null, "thirdPartyAnalysis":null, "descriptionURI":null, "seriesId":"1.3.6.1.4.1.14519.5.2.1.7695.4001.180700359927709468630440576839", "studyDateString":"", "seriesPkId":229377, "exactSize":6129768 }, { "seriesNumber":"2", "seriesUID":"1.3.6.1.4.1.14519.5.2.1.7695.4001.306204232344341694648035234440", "numberImages":124, "modality":"MR", "manufacturer":"GE MEDICAL SYSTEMS", "annotationsFlag":false, "annotationsSize":0, "patientId":"TCGA-08-0244", "patientPkId":"131072", "studyId":"1.3.6.1.4.1.14519.5.2.1.7695.4001.130563880911723253267280582465", "studyPkId":163840, "totalSizeForAllImagesInSeries":16524014, "project":"TCGA-GBM", "description":"3DSPGR AXIAL", "dataProvenanceSiteName":null, "manufacturerModelName":null, "softwareVersion":null, "maxFrameCount":"0", "studyDate":null, "studyDesc":null, "bodyPartExamined":"BRAIN", "study_id":null, "thirdPartyAnalysis":null, "descriptionURI":null, "seriesId":"1.3.6.1.4.1.14519.5.2.1.7695.4001.306204232344341694648035234440", "studyDateString":"", "seriesPkId":229376, "exactSize":16524014 } ] } ]

Thanks, Scott


From: Andrey Fedorov [mailto:notifications@github.com] Sent: Wednesday, July 29, 2020 4:46 PM To: ImagingDataCommons/TCIA-IDC-Coordination Cc: Scott Gustafson; Mention Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

Scott, I don't understand the answer, sorry.

The question I asked is the following: is there an endpoint that would allow mapping from SeriesInstanceUID to your internal "seriesIdentifier"?

In IDC, we need to be able to attribute every item in our repository to the specific collection it belongs to. This is important to be compliant with the terms of use of the TCIA data, and to give credit to the contributors of the data. I think the direct way to do this would be for TCIA or NBIA to provide API that would take SeriesInstanceUID, and return DOI of the collection (either original, or analysis results collection) it belongs to.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-665917474, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7G7HNVJ65VWYINDHUMHK3R6CDADANCNFSM4PKE3M6A.

bcli4d commented 4 years ago

@sgustaf I posted a question above before Justin added you to this repo:

Justin, Scott, I have not been be able to get 3rd party results for Breast Diagnosis using the /getSimpleSearchWithModalityAndBodyPartPaged API. I've found that I have to tinker with the exact spelling of some collections, but haven't found a variation that works for this one. Can you let me know what to use? Thanks, Bill

Any suggestions? Thanks.

sgustaf commented 4 years ago

Hi Bill,

Does your call to getSimpleSearchWithModalityAndBodyPartPaged look like this:

curl -H "Authorization:Bearer 31b3fdf5-addc-4ae0-a0a2-7cb914243ef4" -k "https://public.cancerimagingarchive.net/nbia-api/services/getSimpleSearchWithModalityAndBodyPartPaged" -d "criteriaType0=ThirdPartyAnalysis&value0=yes&criteriaType1=CollectionCriteria&value1=BREAST-DIAGNOSIS&sortField=subject&sortDirection=descending&start=0&size=100000"

{"resultSet":[{"subjectId":"BreastDx-01-0073","project":"BREAST-DIAGNOSIS","id":4751369,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555780],"seriesAndModality":[],"studyIdentifier":4784139}]},{"subjectId":"BreastDx-01-0072","project":"BREAST-DIAGNOSIS","id":4751368,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":2,"studyIdentifiers":[{"seriesIdentifiers":[742555779],"seriesAndModality":[],"studyIdentifier":4784138}]},{"subjectId":"BreastDx-01-0071","project":"BREAST-DIAGNOSIS","id":4751367,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555781],"seriesAndModality":[],"studyIdentifier":4784137}]},{"subjectId":"BreastDx-01-0070","project":"BREAST-DIAGNOSIS","id":4751366,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":2,"totalNumberOfSeries":10,"studyIdentifiers":[{"seriesIdentifiers":[742555772],"seriesAndModality":[],"studyIdentifier":4784135},{"seriesIdentifiers":[742555778],"seriesAndModality":[],"studyIdentifier":4784136}]},{"subjectId":"BreastDx-01-0069","project":"BREAST-DIAGNOSIS","id":4751365,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555774],"seriesAndModality":[],"studyIdentifier":4784134}]},{"subjectId":"BreastDx-01-0068","project":"BREAST-DIAGNOSIS","id":4751364,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555770],"seriesAndModality":[],"studyIdentifier":4784133}]},{"subjectId":"BreastDx-01-0067","project":"BREAST-DIAGNOSIS","id":4751363,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555775],"seriesAndModality":[],"studyIdentifier":4784132}]},{"subjectId":"BreastDx-01-0066","project":"BREAST-DIAGNOSIS","id":4751362,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555777],"seriesAndModality":[],"studyIdentifier":4784131}]},{"subjectId":"BreastDx-01-0065","project":"BREAST-DIAGNOSIS","id":4751360,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555769],"seriesAndModality":[],"studyIdentifier":4784128}]},{"subjectId":"BreastDx-01-0046","project":"BREAST-DIAGNOSIS","id":3375149,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":3,"totalNumberOfSeries":10,"studyIdentifiers":[{"seriesIdentifiers":[742555776],"seriesAndModality":[],"studyIdentifier":3407957},{"seriesIdentifiers":[742555771],"seriesAndModality":[],"studyIdentifier":3407958},{"seriesIdentifiers":[742555773],"seriesAndModality":[],"studyIdentifier":5570560}]},{"subjectId":"BreastDx-01-0041","project":"BREAST-DIAGNOSIS","id":3375144,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":3,"totalNumberOfSeries":12,"studyIdentifiers":[{"seriesIdentifiers":[742555764],"seriesAndModality":[],"studyIdentifier":3407948},{"seriesIdentifiers":[742555761],"seriesAndModality":[],"studyIdentifier":3407949},{"seriesIdentifiers":[742555768],"seriesAndModality":[],"studyIdentifier":3407950}]},{"subjectId":"BreastDx-01-0040","project":"BREAST-DIAGNOSIS","id":3375143,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555762],"seriesAndModality":[],"studyIdentifier":3407947}]},{"subjectId":"BreastDx-01-0039","project":"BREAST-DIAGNOSIS","id":3375142,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":2,"studyIdentifiers":[{"seriesIdentifiers":[742555765],"seriesAndModality":[],"studyIdentifier":3407946}]},{"subjectId":"BreastDx-01-0038","project":"BREAST-DIAGNOSIS","id":3375141,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":2,"totalNumberOfSeries":8,"studyIdentifiers":[{"seriesIdentifiers":[742555760],"seriesAndModality":[],"studyIdentifier":3407944},{"seriesIdentifiers":[742555767],"seriesAndModality":[],"studyIdentifier":3407945}]},{"subjectId":"BreastDx-01-0037","project":"BREAST-DIAGNOSIS","id":3375140,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555763],"seriesAndModality":[],"studyIdentifier":3407943}]},{"subjectId":"BreastDx-01-0036","project":"BREAST-DIAGNOSIS","id":3375139,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555759],"seriesAndModality":[],"studyIdentifier":3407942}]},{"subjectId":"BreastDx-01-0035","project":"BREAST-DIAGNOSIS","id":3375138,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":5,"studyIdentifiers":[{"seriesIdentifiers":[742555766],"seriesAndModality":[],"studyIdentifier":3407941}]},{"subjectId":"BreastDx-01-0034","project":"BREAST-DIAGNOSIS","id":3375137,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":3,"totalNumberOfSeries":15,"studyIdentifiers":[{"seriesIdentifiers":[742555756],"seriesAndModality":[],"studyIdentifier":3407938},{"seriesIdentifiers":[742555749],"seriesAndModality":[],"studyIdentifier":3407939},{"seriesIdentifiers":[742555758],"seriesAndModality":[],"studyIdentifier":3407940}]},{"subjectId":"BreastDx-01-0033","project":"BREAST-DIAGNOSIS","id":3375136,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555752],"seriesAndModality":[],"studyIdentifier":3407937}]},{"subjectId":"BreastDx-01-0032","project":"BREAST-DIAGNOSIS","id":3375135,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":3,"totalNumberOfSeries":13,"studyIdentifiers":[{"seriesIdentifiers":[742555750],"seriesAndModality":[],"studyIdentifier":3407934},{"seriesIdentifiers":[742555753],"seriesAndModality":[],"studyIdentifier":3407935},{"seriesIdentifiers":[742555755],"seriesAndModality":[],"studyIdentifier":3407936}]},{"subjectId":"BreastDx-01-0031","project":"BREAST-DIAGNOSIS","id":3375134,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555754],"seriesAndModality":[],"studyIdentifier":3407933}]},{"subjectId":"BreastDx-01-0030","project":"BREAST-DIAGNOSIS","id":3375133,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555748],"seriesAndModality":[],"studyIdentifier":3407932}]},{"subjectId":"BreastDx-01-0029","project":"BREAST-DIAGNOSIS","id":3375132,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":2,"totalNumberOfSeries":9,"studyIdentifiers":[{"seriesIdentifiers":[742555751],"seriesAndModality":[],"studyIdentifier":3407930},{"seriesIdentifiers":[742555757],"seriesAndModality":[],"studyIdentifier":3407931}]},{"subjectId":"BreastDx-01-0028","project":"BREAST-DIAGNOSIS","id":3375131,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":4,"totalNumberOfSeries":11,"studyIdentifiers":[{"seriesIdentifiers":[742555746],"seriesAndModality":[],"studyIdentifier":3407926},{"seriesIdentifiers":[742555743],"seriesAndModality":[],"studyIdentifier":3407927},{"seriesIdentifiers":[742555744],"seriesAndModality":[],"studyIdentifier":3407928},{"seriesIdentifiers":[742555739],"seriesAndModality":[],"studyIdentifier":3407929}]},{"subjectId":"BreastDx-01-0026","project":"BREAST-DIAGNOSIS","id":3375129,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":2,"totalNumberOfSeries":9,"studyIdentifiers":[{"seriesIdentifiers":[742555738],"seriesAndModality":[],"studyIdentifier":3407923},{"seriesIdentifiers":[742555742],"seriesAndModality":[],"studyIdentifier":3407924}]},{"subjectId":"BreastDx-01-0025","project":"BREAST-DIAGNOSIS","id":3375128,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":3,"totalNumberOfSeries":13,"studyIdentifiers":[{"seriesIdentifiers":[742555747],"seriesAndModality":[],"studyIdentifier":3407920},{"seriesIdentifiers":[742555740],"seriesAndModality":[],"studyIdentifier":3407921},{"seriesIdentifiers":[742555745],"seriesAndModality":[],"studyIdentifier":3407922}]},{"subjectId":"BreastDx-01-0024","project":"BREAST-DIAGNOSIS","id":3375127,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":5,"studyIdentifiers":[{"seriesIdentifiers":[742555741],"seriesAndModality":[],"studyIdentifier":3407919}]},{"subjectId":"BreastDx-01-0023","project":"BREAST-DIAGNOSIS","id":3375126,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555729],"seriesAndModality":[],"studyIdentifier":3407918}]},{"subjectId":"BreastDx-01-0022","project":"BREAST-DIAGNOSIS","id":3375125,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":5,"studyIdentifiers":[{"seriesIdentifiers":[742555737],"seriesAndModality":[],"studyIdentifier":3407917}]},{"subjectId":"BreastDx-01-0021","project":"BREAST-DIAGNOSIS","id":3375124,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":2,"totalNumberOfSeries":6,"studyIdentifiers":[{"seriesIdentifiers":[742555735],"seriesAndModality":[],"studyIdentifier":3407915},{"seriesIdentifiers":[742555733],"seriesAndModality":[],"studyIdentifier":3407916}]},{"subjectId":"BreastDx-01-0020","project":"BREAST-DIAGNOSIS","id":3375123,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":4,"totalNumberOfSeries":11,"studyIdentifiers":[{"seriesIdentifiers":[742555730],"seriesAndModality":[],"studyIdentifier":3407911},{"seriesIdentifiers":[742555734],"seriesAndModality":[],"studyIdentifier":3407912},{"seriesIdentifiers":[742555728],"seriesAndModality":[],"studyIdentifier":3407913},{"seriesIdentifiers":[742555732],"seriesAndModality":[],"studyIdentifier":3407914}]},{"subjectId":"BreastDx-01-0019","project":"BREAST-DIAGNOSIS","id":3375122,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555736],"seriesAndModality":[],"studyIdentifier":3407910}]},{"subjectId":"BreastDx-01-0018","project":"BREAST-DIAGNOSIS","id":3375121,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":5,"studyIdentifiers":[{"seriesIdentifiers":[742555731],"seriesAndModality":[],"studyIdentifier":3407909}]},{"subjectId":"BreastDx-01-0017","project":"BREAST-DIAGNOSIS","id":3375120,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555720],"seriesAndModality":[],"studyIdentifier":3407908}]},{"subjectId":"BreastDx-01-0016","project":"BREAST-DIAGNOSIS","id":3375119,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555724],"seriesAndModality":[],"studyIdentifier":3407907}]},{"subjectId":"BreastDx-01-0015","project":"BREAST-DIAGNOSIS","id":3375118,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":5,"studyIdentifiers":[{"seriesIdentifiers":[742555722],"seriesAndModality":[],"studyIdentifier":3407906}]},{"subjectId":"BreastDx-01-0014","project":"BREAST-DIAGNOSIS","id":3375117,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":3,"studyIdentifiers":[{"seriesIdentifiers":[742555727],"seriesAndModality":[],"studyIdentifier":3407905}]},{"subjectId":"BreastDx-01-0013","project":"BREAST-DIAGNOSIS","id":3375116,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":5,"studyIdentifiers":[{"seriesIdentifiers":[742555723],"seriesAndModality":[],"studyIdentifier":3407904}]},{"subjectId":"BreastDx-01-0012","project":"BREAST-DIAGNOSIS","id":3375115,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555719],"seriesAndModality":[],"studyIdentifier":3407903}]},{"subjectId":"BreastDx-01-0011","project":"BREAST-DIAGNOSIS","id":3375114,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":6,"studyIdentifiers":[{"seriesIdentifiers":[742555726],"seriesAndModality":[],"studyIdentifier":3407902}]},{"subjectId":"BreastDx-01-0010","project":"BREAST-DIAGNOSIS","id":3375113,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":2,"totalNumberOfSeries":8,"studyIdentifiers":[{"seriesIdentifiers":[742555725],"seriesAndModality":[],"studyIdentifier":3407900},{"seriesIdentifiers":[742555718],"seriesAndModality":[],"studyIdentifier":3407901}]},{"subjectId":"BreastDx-01-0009","project":"BREAST-DIAGNOSIS","id":3375112,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":3,"totalNumberOfSeries":10,"studyIdentifiers":[{"seriesIdentifiers":[742555716],"seriesAndModality":[],"studyIdentifier":3407897},{"seriesIdentifiers":[742555711],"seriesAndModality":[],"studyIdentifier":3407898},{"seriesIdentifiers":[742555721],"seriesAndModality":[],"studyIdentifier":3407899}]},{"subjectId":"BreastDx-01-0008","project":"BREAST-DIAGNOSIS","id":3375111,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":2,"totalNumberOfSeries":8,"studyIdentifiers":[{"seriesIdentifiers":[742555717],"seriesAndModality":[],"studyIdentifier":3407895},{"seriesIdentifiers":[742555713],"seriesAndModality":[],"studyIdentifier":3407896}]},{"subjectId":"BreastDx-01-0007","project":"BREAST-DIAGNOSIS","id":3375110,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555714],"seriesAndModality":[],"studyIdentifier":3407894}]},{"subjectId":"BreastDx-01-0006","project":"BREAST-DIAGNOSIS","id":3375109,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":4,"studyIdentifiers":[{"seriesIdentifiers":[742555710],"seriesAndModality":[],"studyIdentifier":3407893}]},{"subjectId":"BreastDx-01-0005","project":"BREAST-DIAGNOSIS","id":3375108,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":9,"totalNumberOfSeries":30,"studyIdentifiers":[{"seriesIdentifiers":[742555702,742555705],"seriesAndModality":[],"studyIdentifier":3407884},{"seriesIdentifiers":[742555703,742555706],"seriesAndModality":[],"studyIdentifier":3407886},{"seriesIdentifiers":[742555704],"seriesAndModality":[],"studyIdentifier":3407887},{"seriesIdentifiers":[742555699,742555707],"seriesAndModality":[],"studyIdentifier":3407889},{"seriesIdentifiers":[742555708,742555709],"seriesAndModality":[],"studyIdentifier":3407890},{"seriesIdentifiers":[742555712],"seriesAndModality":[],"studyIdentifier":3407891},{"seriesIdentifiers":[742555715],"seriesAndModality":[],"studyIdentifier":3407892}]},{"subjectId":"BreastDx-01-0004","project":"BREAST-DIAGNOSIS","id":3375107,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":2,"totalNumberOfSeries":7,"studyIdentifiers":[{"seriesIdentifiers":[742555698],"seriesAndModality":[],"studyIdentifier":3407882},{"seriesIdentifiers":[742555700],"seriesAndModality":[],"studyIdentifier":3407883}]},{"subjectId":"BreastDx-01-0003","project":"BREAST-DIAGNOSIS","id":3375106,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":4,"totalNumberOfSeries":20,"studyIdentifiers":[{"seriesIdentifiers":[742555695],"seriesAndModality":[],"studyIdentifier":3407878},{"seriesIdentifiers":[742555688],"seriesAndModality":[],"studyIdentifier":3407879},{"seriesIdentifiers":[742555691],"seriesAndModality":[],"studyIdentifier":3407880},{"seriesIdentifiers":[742555701],"seriesAndModality":[],"studyIdentifier":3407881}]},{"subjectId":"BreastDx-01-0002","project":"BREAST-DIAGNOSIS","id":3375105,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":5,"totalNumberOfSeries":16,"studyIdentifiers":[{"seriesIdentifiers":[742555689,742555690],"seriesAndModality":[],"studyIdentifier":3407873},{"seriesIdentifiers":[742555697],"seriesAndModality":[],"studyIdentifier":3407875},{"seriesIdentifiers":[742555692,742555696],"seriesAndModality":[],"studyIdentifier":3407876},{"seriesIdentifiers":[742555693],"seriesAndModality":[],"studyIdentifier":3407877}]},{"subjectId":"BreastDx-01-0001","project":"BREAST-DIAGNOSIS","id":3375104,"modalities":["SR"],"bodyParts":[""],"species":["337915000"],"totalNumberOfStudies":1,"totalNumberOfSeries":6,"studyIdentifiers":[{"seriesIdentifiers":[742555694],"seriesAndModality":[],"studyIdentifier":3407872}]}],"totalPatients":50,"bodyParts":[{"value":"","count":50}],"modalities":[{"value":"SR","count":50}],"collections":[{"value":"BREAST-DIAGNOSIS","count":50}],"species":[{"value":"337915000","count":50}],"sort":"subject-descending"}

Thanks, Scott

From: Bill Clifford [mailto:notifications@github.com] Sent: Wednesday, July 29, 2020 6:01 PM To: ImagingDataCommons/TCIA-IDC-Coordination Cc: Scott Gustafson; Mention Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

@sgustafhttps://github.com/sgustaf I posted a question above before Justin added you to this repo:

Justin, Scott, I have not been be able to get 3rd party results for Breast Diagnosis using the /getSimpleSearchWithModalityAndBodyPartPaged API. I've found that I have to tinker with the exact spelling of some collections, but haven't found a variation that works for this one. Can you let me know what to use? Thanks, Bill

Any suggestions? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-665952278, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7G7HLEWGUSRBLPW5FB2JDR6CLZHANCNFSM4PKE3M6A.

bcli4d commented 4 years ago

Scott,

No, but it will. I was trying to use other collection IDs. For some collections, there are several different IDs between what various TCIA and NBIA web pages and APIs expose. I will try to tabulate those in a separate issue.

BTW, the result of the NBIA /getCollection API is not what I would expect:  $ curl https://imaging.nci.nih.gov/nbia-api/services/v1/getCollectionValues[{"Collection":"Head-Neck Cetuximab-Demo"},{"Collection":"NSCLC Radiogenomics-Demo"},{"Collection":"QIN Breast DCE-MRI-Demo"},{"Collection":"QIN PET Phantom-Demo"},{"Collection":"TCGA-BRCA-Demo"}] Should I report this somewhere?

Thanks, Bill

sgustaf commented 4 years ago

Hi Bill,

https://imaging.nci.nih.gov/ is no longer an active site. All the collections have been moved to TCIA. It is now just a demonstration site for the software.

[cid:image004.jpg@01D66630.2358BDD0]

Thanks, Scott

From: Bill Clifford [mailto:notifications@github.com] Sent: Wednesday, July 29, 2020 9:16 PM To: ImagingDataCommons/TCIA-IDC-Coordination Cc: Scott Gustafson; Mention Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

Scott,

No, but it will. I was trying to use other collection IDs. For some collections, there are several different IDs between what various TCIA and NBIA web pages and APIs expose. I will try to tabulate those in a separate issue.

BTW, the result of the NBIA /getCollection API is not what I would expect: $ curl https://imaging.nci.nih.gov/nbia-api/services/v1/getCollectionValues[{"Collection":"Head-Neck Cetuximab-Demo"},{"Collection":"NSCLC Radiogenomics-Demo"},{"Collection":"QIN Breast DCE-MRI-Demo"},{"Collection":"QIN PET Phantom-Demo"},{"Collection":"TCGA-BRCA-Demo"}] Should I report this somewhere?

Thanks, Bill

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-666017865, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7G7HMYPWP54KCAO4CCTSTR6DCWBANCNFSM4PKE3M6A.

kirbyju commented 4 years ago

Hi Andrey, In IDC, we need to be able to attribute every item in our repository to the specific collection it belongs to. This is important to be compliant with the terms of use of the TCIA data, and to give credit to the contributors of the data. I think the direct way to do this would be for TCIA or NBIA to provide API that would take SeriesInstanceUID, and return DOI of the collection (either original, or analysis results collection) it belongs to.

At the moment NBIA has a database field that contains a URI (DOIs in our case) for any Analysis Result datasets. This is a new feature, and thus we don't currently have all the regular "Collections" DOIs populated in these fields. However we're on board with adding this, and will work to get that incorporated soon. Back populating the database for existing data should be a pretty trivial task. We'll need to work with our curators to update our SOP for adding new collections so this is included on future data submissions.

Thanks, Justin

Justin Kirby (contractor) Technical Project Manager, Frederick National Laboratory for Cancer Research Technical Director, Cancer Imaging Informatics Lab ORCiD: https://orcid.org/0000-0003-3487-8922 240-276-6016 justin.kirby@nih.govmailto:kirbyju@mail.nih.gov


From: Andrey Fedorov notifications@github.com Sent: Wednesday, July 29, 2020 4:45 PM To: ImagingDataCommons/TCIA-IDC-Coordination TCIA-IDC-Coordination@noreply.github.com Cc: Kirby, Justin (NIH/NCI) [C] kirbyju@mail.nih.gov; Mention mention@noreply.github.com Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

Scott, I don't understand the answer, sorry.

The question I asked is the following: is there an endpoint that would allow mapping from SeriesInstanceUID to your internal "seriesIdentifier"?

In IDC, we need to be able to attribute every item in our repository to the specific collection it belongs to. This is important to be compliant with the terms of use of the TCIA data, and to give credit to the contributors of the data. I think the direct way to do this would be for TCIA or NBIA to provide API that would take SeriesInstanceUID, and return DOI of the collection (either original, or analysis results collection) it belongs to.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-665917474, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASE6BZWQCC5GQCH4JUHX6TR6CDADANCNFSM4PKE3M6A.

bcli4d commented 4 years ago

@sgustaf, @kirbyju What are the implications of https://imaging.nci.nih.gov/ no longer being an active site? Will the NBIA API atrophy? Disappear at sometime soon ? E.G. I use /getCollectionDescriptions. Should I be pushing for that functionality to be added to the NCIA API?

Bill

ulrikew commented 4 years ago

Hi Bill, to clarify. NBIA is the DICOM Archive within TCIA (https://cancerimagingarchive.net/nbia-search/) but there are other installations of NBIA e.g. the demo instance at NCI https://imaging.nci.nih.gov The API guide that we forwarded to you is written for accessing the demo instance at NCI, hence the base URL of https://imaging.nci.nih.gov The TCIA data can be reached via the instructions that I had given to the IDC team earlier and that Scott reiterated. The base URL is https://public.cancerimagingarchive.net/ None of the APIs will go away since they are used to power the GUIs at TCIA and the other NBIA installations. Ulli

fedorov commented 4 years ago

@sgustaf you said the following:

There is a separate call to getStudyDrillDownWithSeriesIds which takes seriesUIDs that will give you the information for particular seriesUIDs:

When I do what you suggested for a series that I know is in the analysis results collection (for the original LIDC collection), I get the following. As you can see, descriptionURI is blank.

$ curl -H "Authorization:Bearer 47e869e4-7b6c-4da5-8515-741dddf58277" -k "https://public.cancerimagingarchive.net/nbia-api/services/getStudyDrillDownWithSeriesIds" -d "list=1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192"

[
  {
    "studyId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178",
    "date": 946706400000,
    "description": null,
    "id": 2260993,
    "study_id": null,
    "seriesList": [
      {
        "seriesNumber": "3000566",
        "seriesUID": "1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192",
        "numberImages": 133,
        "modality": "CT",
        "manufacturer": "GE MEDICAL SYSTEMS",
        "annotationsFlag": true,
        "annotationsSize": 309829,
        "patientId": "LIDC-IDRI-0001",
        "patientPkId": "2228224",
        "studyId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178",
        "studyPkId": 2260993,
        "totalSizeForAllImagesInSeries": 70018838,
        "project": "LIDC-IDRI",
        "description": null,
        "dataProvenanceSiteName": null,
        "manufacturerModelName": null,
        "softwareVersion": null,
        "maxFrameCount": "0",
        "studyDate": null,
        "studyDesc": null,
        "bodyPartExamined": "CHEST",
        "study_id": null,
        "thirdPartyAnalysis": null,
        "descriptionURI": "",
        "seriesId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192",
        "studyDateString": "",
        "seriesPkId": 2326529,
        "exactSize": 70328667
      }
    ]
  }
]
fedorov commented 4 years ago

Interestingly, TCIA portal does have the correct DOI assigned to the series I queried via NBIA API above:

image

fedorov commented 4 years ago

TCIA ticket submitted: https://help.cancerimagingarchive.net/servicedesk/customer/portal/1/TH-46188

sgustaf commented 4 years ago

Hi Andrey,

The series ending in ...20603192 is not a third party analysis, it is an orginal CT. The rest of the series in that study are third party analysis (SEG, SR) and have an associated DOI that are matched at the series level. So any given collection can have both the original scans as well as third party analysis. You can determine which is which via the interface, or as you have seen the API.

[cid:image001.png@01D66BFC.1B8D3D20]

Thanks, Scott

From: Andrey Fedorov [mailto:notifications@github.com] Sent: Thursday, August 06, 2020 2:04 PM To: ImagingDataCommons/TCIA-IDC-Coordination Cc: Scott Gustafson; Mention Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

@sgustafhttps://github.com/sgustaf you said the following:

There is a separate call to getStudyDrillDownWithSeriesIds which takes seriesUIDs that will give you the information for particular seriesUIDs:

When I do what you suggested for a series that I know is in the analysis results collection (for the original LIDC collection), I get the following. As you can see, descriptionURI is blank.

$ curl -H "Authorization:Bearer 47e869e4-7b6c-4da5-8515-741dddf58277" -k "https://public.cancerimagingarchive.net/nbia-api/services/getStudyDrillDownWithSeriesIds" -d "list=1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192"

[

{

"studyId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178",

"date": 946706400000,

"description": null,

"id": 2260993,

"study_id": null,

"seriesList": [

  {

    "seriesNumber": "3000566",

    "seriesUID": "1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192",

    "numberImages": 133,

    "modality": "CT",

    "manufacturer": "GE MEDICAL SYSTEMS",

    "annotationsFlag": true,

    "annotationsSize": 309829,

    "patientId": "LIDC-IDRI-0001",

    "patientPkId": "2228224",

    "studyId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178",

    "studyPkId": 2260993,

    "totalSizeForAllImagesInSeries": 70018838,

    "project": "LIDC-IDRI",

    "description": null,

    "dataProvenanceSiteName": null,

    "manufacturerModelName": null,

    "softwareVersion": null,

    "maxFrameCount": "0",

    "studyDate": null,

    "studyDesc": null,

    "bodyPartExamined": "CHEST",

    "study_id": null,

    "thirdPartyAnalysis": null,

    "descriptionURI": "",

    "seriesId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192",

    "studyDateString": "",

    "seriesPkId": 2326529,

    "exactSize": 70328667

  }

]

}

]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-670088287, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7G7HOYZMBR4WSHRYRBD2TR7LV7FANCNFSM4PKE3M6A.

sgustaf commented 4 years ago

BTW: We are working on getting the DOIs populated for all series, both third party analysis and original.

Thanks, Scott

From: Scott Gustafson Sent: Thursday, August 06, 2020 2:23 PM To: ImagingDataCommons/TCIA-IDC-Coordination; ImagingDataCommons/TCIA-IDC-Coordination Cc: Mention Subject: RE: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

Hi Andrey,

The series ending in ...20603192 is not a third party analysis, it is an orginal CT. The rest of the series in that study are third party analysis (SEG, SR) and have an associated DOI that are matched at the series level. So any given collection can have both the original scans as well as third party analysis. You can determine which is which via the interface, or as you have seen the API.

[cid:image001.png@01D66BFC.1B8D3D20]

Thanks, Scott

From: Andrey Fedorov [mailto:notifications@github.com] Sent: Thursday, August 06, 2020 2:04 PM To: ImagingDataCommons/TCIA-IDC-Coordination Cc: Scott Gustafson; Mention Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

@sgustafhttps://github.com/sgustaf you said the following:

There is a separate call to getStudyDrillDownWithSeriesIds which takes seriesUIDs that will give you the information for particular seriesUIDs:

When I do what you suggested for a series that I know is in the analysis results collection (for the original LIDC collection), I get the following. As you can see, descriptionURI is blank.

$ curl -H "Authorization:Bearer 47e869e4-7b6c-4da5-8515-741dddf58277" -k "https://public.cancerimagingarchive.net/nbia-api/services/getStudyDrillDownWithSeriesIds" -d "list=1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192"

[

{

"studyId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178",

"date": 946706400000,

"description": null,

"id": 2260993,

"study_id": null,

"seriesList": [

  {

    "seriesNumber": "3000566",

    "seriesUID": "1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192",

    "numberImages": 133,

    "modality": "CT",

    "manufacturer": "GE MEDICAL SYSTEMS",

    "annotationsFlag": true,

    "annotationsSize": 309829,

    "patientId": "LIDC-IDRI-0001",

    "patientPkId": "2228224",

    "studyId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178",

    "studyPkId": 2260993,

    "totalSizeForAllImagesInSeries": 70018838,

    "project": "LIDC-IDRI",

    "description": null,

    "dataProvenanceSiteName": null,

    "manufacturerModelName": null,

    "softwareVersion": null,

    "maxFrameCount": "0",

    "studyDate": null,

    "studyDesc": null,

    "bodyPartExamined": "CHEST",

    "study_id": null,

    "thirdPartyAnalysis": null,

    "descriptionURI": "",

    "seriesId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192",

    "studyDateString": "",

    "seriesPkId": 2326529,

    "exactSize": 70328667

  }

]

}

]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-670088287, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7G7HOYZMBR4WSHRYRBD2TR7LV7FANCNFSM4PKE3M6A.

kirbyju commented 4 years ago

Until Scott finishes this database update activity you will likely find that any 3rd party analysis series has a DOI associated with it and any series that was part of a regular/original TCIA collection submission is empty.

Justin Kirby (contractor) Technical Project Manager, Frederick National Laboratory for Cancer Research Technical Director, Cancer Imaging Informatics Lab ORCiD: https://orcid.org/0000-0003-3487-8922 240-276-6016 justin.kirby@nih.govmailto:kirbyju@mail.nih.gov


From: sgustaf notifications@github.com Sent: Thursday, August 6, 2020 2:30 PM To: ImagingDataCommons/TCIA-IDC-Coordination TCIA-IDC-Coordination@noreply.github.com Cc: Kirby, Justin (NIH/NCI) [C] kirbyju@mail.nih.gov; Mention mention@noreply.github.com Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

BTW: We are working on getting the DOIs populated for all series, both third party analysis and original.

Thanks, Scott

From: Scott Gustafson Sent: Thursday, August 06, 2020 2:23 PM To: ImagingDataCommons/TCIA-IDC-Coordination; ImagingDataCommons/TCIA-IDC-Coordination Cc: Mention Subject: RE: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

Hi Andrey,

The series ending in ...20603192 is not a third party analysis, it is an orginal CT. The rest of the series in that study are third party analysis (SEG, SR) and have an associated DOI that are matched at the series level. So any given collection can have both the original scans as well as third party analysis. You can determine which is which via the interface, or as you have seen the API.

[cid:image001.png@01D66BFC.1B8D3D20]

Thanks, Scott

From: Andrey Fedorov [mailto:notifications@github.com] Sent: Thursday, August 06, 2020 2:04 PM To: ImagingDataCommons/TCIA-IDC-Coordination Cc: Scott Gustafson; Mention Subject: Re: [ImagingDataCommons/TCIA-IDC-Coordination] Scraping analysis results DICOM manifests (#10)

@sgustafhttps://github.com/sgustaf you said the following:

There is a separate call to getStudyDrillDownWithSeriesIds which takes seriesUIDs that will give you the information for particular seriesUIDs:

When I do what you suggested for a series that I know is in the analysis results collection (for the original LIDC collection), I get the following. As you can see, descriptionURI is blank.

$ curl -H "Authorization:Bearer 47e869e4-7b6c-4da5-8515-741dddf58277" -k "https://public.cancerimagingarchive.net/nbia-api/services/getStudyDrillDownWithSeriesIds" -d "list=1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192"

[

{

"studyId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178",

"date": 946706400000,

"description": null,

"id": 2260993,

"study_id": null,

"seriesList": [

{

"seriesNumber": "3000566",

"seriesUID": "1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192",

"numberImages": 133,

"modality": "CT",

"manufacturer": "GE MEDICAL SYSTEMS",

"annotationsFlag": true,

"annotationsSize": 309829,

"patientId": "LIDC-IDRI-0001",

"patientPkId": "2228224",

"studyId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178",

"studyPkId": 2260993,

"totalSizeForAllImagesInSeries": 70018838,

"project": "LIDC-IDRI",

"description": null,

"dataProvenanceSiteName": null,

"manufacturerModelName": null,

"softwareVersion": null,

"maxFrameCount": "0",

"studyDate": null,

"studyDesc": null,

"bodyPartExamined": "CHEST",

"study_id": null,

"thirdPartyAnalysis": null,

"descriptionURI": "",

"seriesId": "1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192",

"studyDateString": "",

"seriesPkId": 2326529,

"exactSize": 70328667

}

]

}

]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-670088287, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7G7HOYZMBR4WSHRYRBD2TR7LV7FANCNFSM4PKE3M6A.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ImagingDataCommons/TCIA-IDC-Coordination/issues/10#issuecomment-670100607, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AASE6BZLS6U4DDVTKXQU6V3R7LZDBANCNFSM4PKE3M6A.

fedorov commented 4 years ago

The series ending in ...20603192 is not a third party analysis, it is an original CT.

No, it is a third party analysis result.

Full SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192

Based on TCIA portal interface, 3rd party is listed as "Yes", and points to this DOI: https://doi.org/10.7937/TCIA.2018.h7umfurq.

image

Most likely this field is empty because, as Justin alluded, Scott's work to update the database is not complete.