Closed wbwakeman closed 3 years ago
INTRODUCTION The Ivy Glioblastoma Atlas Project is a collection of data from glioblastoma brain tumors. The project is a collaboration between the Allen Institute for Brain Science and the Ben and Catherine Ivy Foundation. Glioblastoma is an aggressive brain cancer. Survival after diagnosis is just 12 to 15 months. An interactive atlas is available at https://glioblastoma.alleninstitute.org/ and consists of: • Images of in situ hybridization experiments identifying where key genes are expressed in the tumor tissue (20x magnification) • Matching histology images using Hematoxylin and Eosin (H&E) stain • Gene expression masks for all ISH images • Annotated mask images of tumor structures for all ISH and H&E images
The project also comprises additional data modalities that are not currently available from the AWS bucket. These are: • RNA sequencing data for 270 samples from 44 tumors • Companion clinical database hosted at https://ivygap.org (registration required) • Accompanying MRI/CT scan data for the patients is available at The Cancer Imaging Archive .
The project has been published in the May 11, 2018 edition of Science.
Create a Jupyter notebook with data access and display examples for the Ivy Glioblastoma Atlas data set on AWS. Repo: https://github.com/AllenInstitute/open_dataset_tools JPEG image and JSON metadata files are publicly available in this bucket: s3://allen-ivy-glioblastoma-atlas/ https://console.aws.amazon.com/s3/buckets/allen-ivy-glioblastoma-atlas/
INTRODUCTION The Ivy Glioblastoma Atlas Project is a collection of data from glioblastoma brain tumors. The project is a collaboration between the Allen Institute for Brain Science and the Ben and Catherine Ivy Foundation. Glioblastoma is an aggressive brain cancer. Survival after diagnosis is just 12 to 15 months. An interactive atlas is available at https://glioblastoma.alleninstitute.org/ and consists of: • Images of in situ hybridization experiments identifying where key genes are expressed in the tumor tissue (20x magnification) • Matching histology images using Hematoxylin and Eosin (H&E) stain • Gene expression masks for all ISH images • Annotated mask images of tumor structures for all ISH and H&E images • RNA sequencing data for 270 samples from 44 tumors • Companion clinical database hosted at https://ivygap.swedish.org (registration required) • Accompanying MRI/CT scan data for the patients is available at the Cancer Imaging Archive .
The project has been published in the May 11, 2018 edition of Science.
DATA SECTION The image data for the project is being made available as an AWS public dataset to enable computational scientists easy access to a rich, well-annotated data set for training and validation of machine learning, classification, and developing computer vision applications such as image segmentation. The data are publicly available from the s3://allen-ivy-glioblastoma-atlas/ bucket. There is a directory for each donor patient. These directories contain a directory for each tissue sample from the donor. Each directory also contains a JSON file with relevant and useful metadata.
Query what data is available. Result might be 30000 image files and 100 json metadata files
Query the organization of the dataset. Result might be 26 donors, each containing some number of specimens
DONOR SECTION -blurb-
Query for information is available about the donors. Result might be the fields in the donor json metadata files
Query for list of all donors Result is list of donors
Query for all information for one specific example donor Result will be all info for that donor
Query for all donors that have some condition, e.g. initial_kps > 90 Result will be list of donor ids
Query for all donors that have a combination of conditions e.g. initial_kps > 90 AND mgmt_methylation = ‘Yes’ Result will be list of donor ids
SPECIMEN SECTION -blurb-
Query for information available about the specimens Result might be the fields in the specimen json metadata files
Query for list of all specimens Result is list of specimens
Query for all information for one specific example specimen Result is all info for that specimen Show that link gives info in web app for that specimen
Query for all specimens that have some condition, e.g. study_name = ‘Cancer Stem Cells ISH for Enriched Genes’ Result is list of specimens
Query for all specimens that have some condition AND donor has some condition, e.g. initial_kps > 90 AND study_name = ‘Cancer Stem Cells ISH for Enriched Genes’ Result is list of specimens
SECTION DATA SETS -blurb-
IMAGES -blurb-
Get all the H&E and ISH images for a specimen ordered by section_number
Download all the images onto local computer with useful file names
Get just the set of H&E images
Get just the set of tissue annotation images
Get just the set of tissue boundary images
Get a single ISH image
Provide one or more suggestions for "How do I download all the images in the set"