ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

Backfilling data_use_restriction for open access datasets #1268

Closed idazucchi closed 1 week ago

idazucchi commented 4 months ago

Context

As a part of the work to handle managed access datasets in the Data portal we’ve added a new required field to the project metadata - data_use_restriction - to describe the allowed uses of the dataset. All new datasets will use project v19+ and will have to include data_use_restriction, but existing projects lack this information.

Backfilling data_use_restriction for existing projects is important for the browser who will display this information and to ensure that we are not inadvertently treating MA datasets as open access because data_use_restriction was omitted.

We should include all projects in ingest for backfilling, regardless of status, because a stalled or eligible project might be worked on later on, but if we don’t backfill now data_use_restriction we might forget and ultimately publish the project without the information

Tasks

Acceptance criteria

idazucchi commented 3 months ago

I've prepared a script to export just the project metadata - I've tested it on some projects that needed a title update and we are waiting for feedback after the import goes through I need to save the script somewhere, not sure of where

arschat commented 3 months ago

Stalled until managed access pilot project is completed & wave 1 & 2 datasets are wrangled.

idazucchi commented 1 week ago

this ticket is replaced by https://github.com/ebi-ait/hca-ebi-wrangler-central/issues/1301