HumanCellAtlas / metadata-schema

This repo is for the metadata schemas associated with the HCA
Apache License 2.0
64 stars 32 forks source link

Improve metadata section on the HCA DCP Data Browser #1134

Closed mshadbolt closed 3 years ago

mshadbolt commented 4 years ago

Description

As the metadata team we want to ensure that the [metadata section](https://data.humancellatlas.org/metadata) of the Data Browser achieves two goals: 1. Enables contributors to understand the metadata we require when accepting data for submission 1. Enables downstream users to understand the metadata we make available for each dataset in the DCP Currently the Metadata dictionary is not easy to understand, particularly for users that are not familiar with our metadata model. This ticket will track ideas and prototypes for improving the section. **Acceptance Criteria**
ESapenaVentura commented 4 years ago

Should we create a doc on the Drafts folder and link it here for everyone to share their opinion?

diekhans commented 4 years ago

The stalled metadata de-modulation RFC address part of this issue by allowing for type-specific rather than module-specific user-friendly names.

https://github.com/HumanCellAtlas/dcp-community/tree/rfc-metadata-schema-simplification/rfcs

matthewspeir commented 4 years ago

In a recent user interview, they explained that they weren't familiar with the ontologies that we use (e.g. EFO). It would be great if there was something under the metadata section that explained what ontologies we use (and maybe why we chose them)?

Direct quote:

The metadata dictionary could be more informative. For example, “An ontology term identifier in the form prefix:accession.e.g. "HsapDv:0000087" or "EFO:0002588".” This documentation doesn’t describe what the HsapDv or EFO ontologies are. I have never heard of these before; I’m familiar with Gene Ontology but not with these two.

mshadbolt commented 4 years ago

@matthewspeir That is great feedback and it is something we hear a lot from researchers as well. The wider scientific community isn't really that familiar with using ontologies to describe parts of their experiment, more at the end for analysis purposes such as gene ontologies. I think it is a great idea to incorporate more info about this into the website.

lauraclarke commented 4 years ago

I think making the use of ontologies clearer is a great plan

I quite like how it is done here, that said there is no why in this documentation

https://data.faang.org/ruleset/samples#standard

Having the documentation cover the why too is definitely valuable but we need to be cautious about putting too much time into that because most people don't read extensive documentation

matthewspeir commented 4 years ago

Yeah, the 'why' isn't essential. I think that was more my personal interest rather than the user's interest.

lauraclarke commented 4 years ago

Sounds like a page that explains the ontologies we use and a brief rationale might be a great start. I know representatives from other similar consortia frequently ask me that question so having a page I can reference would be great

zperova commented 4 years ago

@ESapenaVentura @mshadbolt @lauraclarke I have put the Doc with suggestions that we have discussed in the Metadata/Standards folder - also now linked in the description of this issue.

mshadbolt commented 4 years ago

Here's a quick prototype of a metadata browser I put together using R Shiny - https://mshadbolt-hca-ebi.shinyapps.io/metadata_browser/. I think having something interactive, searchable and filterable will allow users to see exactly which fields we collect and the filtering allows them to not be overwhelmed by seeing > 500 fields. It is very beta and more just a proof of concept that can be expanded. I have some placeholders for other things I have been thinking about but no prototypes as yet.

The main use case I am aiming at fulfilling here is for users that want to submit data and want to know:

It is based off the top rows of a spreadsheet and I think that the descriptions can be improved a lot.

claymfischer commented 4 years ago

This is fantastic @mshadbolt! Very excited to see how much progress you've already made in this exploration.

zperova commented 4 years ago

Improvements to the Metadata Section has been discussed at the Comprehensive Data Portal Content review at the DCP F2F in Boston. Slides form the session are here: https://docs.google.com/presentation/d/1SmqqIoGUjscWDKVNLAwGFN1LBVoTAQhuFyb7WAPwqr0/edit#slide=id.g7062982aed_0_0

@zperova and @matthewspeir will organize and make actionable items from the session on November 6 2019.

morrisonnorman commented 4 years ago

"@zperova and @matthewspeir will organize and make actionable items from the session on November 6 2019." - Was there any progress on this?

matthewspeir commented 4 years ago

@morrisonnorman Yes! Zina, Liz Kiernan, and I met yesterday and sifted through the notes from the session yesterday. We will be making tickets for improvements in the data-portal repo. I believe @zperova has a document of recommendations for the metadata section of the portal.

diekhans commented 4 years ago

@zperova would you be so kind as to add a link to your document in this ticket?

zperova commented 4 years ago

@diekhans this document summarises what we have discussed prior to the DCP F2F about the metadata section of the Portal and provides what we thought would be the structure of the section: https://docs.google.com/document/d/1SSscSY8OVPNOtQwcr1eHvpYnmwGV5kfzt4EMM7ZvRsc/edit

I am yet to link the specific tasks resulted from the Content Review Session

mshadbolt commented 4 years ago

There is now an epic in the Portal Repo where Dave is collecting issues for the metadata revamp https://app.zenhub.com/workspaces/orange-5d680d7e3eeb5f1bbdf5668f/issues/humancellatlas/data-portal/611

ESapenaVentura commented 3 years ago

Closing ticket as this feels done