ACED-IDP / data_model

Schema and synthetic data
2 stars 1 forks source link

Human cell atlas import #21

Open bwalsh opened 1 year ago

bwalsh commented 1 year ago

As an ACED stakeholder, in order to visualize how the IDP will work with single cell data, it would be useful to see representative public domain or synthetic data in the sytem.

Should we download and import a HCA project?

https://data.humancellatlas.org/explore/projects?filter=%5B%7B%22facetName%22:%22projectId%22,%22terms%22:%5B%22a62dae2e-cd69-4d5c-b5f8-4f7e8abdbafa%22%5D%7D,%7B%22facetName%22:%22genusSpecies%22,%22terms%22:%5B%22Homo%20sapiens%22%5D%7D,%7B%22facetName%22:%22donorDisease%22,%22terms%22:%5B%22breast%20cancer%22%5D%7D,%7B%22facetName%22:%22specimenDisease%22,%22terms%22:%5B%22breast%20cancer%22%5D%7D%5D

bwalsh commented 1 year ago

If we download the metadata and look at its contents you will see it lists a mixture of file types including csv files.

https://service.azul.data.humancellatlas.org/manifest/files?catalog=dcp21&format=compact&filters=%7B%22projectId%22%3A+%7B%22is%22%3A+%5B%22a62dae2e-cd69-4d5c-b5f8-4f7e8abdbafa%22%5D%7D%7D&objectKey=manifests%2F77af5fb1-d47a-592f-9216-327fe645ee7f.83e2ac44-854a-54ee-9420-b4e2f6102404.tsv

image

Note there are both drs:// and https:// urls.

image

Picking one of the csv files at random we see:

image
bwalsh commented 1 year ago

There is also a way to retrieve the data from terra. Note PFB link.

https://data.humancellatlas.org/explore/projects/a62dae2e-cd69-4d5c-b5f8-4f7e8abdbafa/export-to-terra

That data can be opened in terra and we see the full project.

image

Including files.

image
bwalsh commented 1 year ago

The entity types are the typical ones you would see in any study. It is encouraging to see provenance about the workflow sequencing_*

We could import this style of PFB and map it to our data model

bwalsh commented 1 year ago

@kellrott per our conversation about HCA and h5 matrices yesterday