Convert HCA spreadsheets into MAGE-TAB for March release

HumanCellAtlas / metadata-schema

This repo is for the metadata schemas associated with the HCA

Apache License 2.0

65 stars 32 forks source link

Convert HCA spreadsheets into MAGE-TAB for March release #1227

Closed zperova closed 4 years ago

zperova commented 4 years ago

Description

As a HCA wrangler to is responsible for delivering correct data and metadata for the March release of HCA data, I would like to convert HCA spreadsheets to facilitate data flow of the HCA data to SCEA in time for March release deadeline.

Deadline: 6 Mar 2020 - to be confrimed with Pablo

Acceptance Criteria

[x] 2 datasets have been manually curated to MAGE-TAB
[x] 3 datasets have been run through the converter and manually curated to MAGE-TAB
[x] determine which steps of manual curation can be automated @javfg
[x] all datasets have been run through validation scripts and uploaded into github repo for SCEA processing

lauraclarke commented 4 years ago

@javfg @ami-day Can you update this ticket to reflect progress and add remaining steps?

ami-day commented 4 years ago

@javfg @ami-day Can you update this ticket to reflect progress and add remaining steps?

@lauraclarke Yes, I just ticked the above to reflect the conversion of 5 datasets. However, as Javier worked on idf files and I worked on sdrf files, we are in the process of integrating our work for the 5 datasets. At this point we can again review the extent to which automation is possible and accurate.

Today we plan to get the validator running on an example dataset as a test individually, and we are going to meet this Monday to run the validator on the 5 datasets together based on our test experience so we can determine the cause of any errors and discuss how to resolve them.

ami-day commented 4 years ago

Also just as a comment: the converter wasn't actually used, and having looked at some output files from the converter, it doesn't make sense to me to use it. It seems it will be much more convenient for us to further develop Javier's script based on our experience. But we could discuss once done.

lauraclarke commented 4 years ago

We aren't obliged to use any tools in particular. Just make sure we use suitable tools to deliver on needs

zperova commented 4 years ago

@ami-day feel free to modify the body of the issue when needed. As we move on with the task some initial solutions might not work. It is good to track in the comments the process and document any changes from the initial plan for reference.

zperova commented 4 years ago

@ami-day could you please also link the tickets with the datasets that you have worked on to this ticket? A mention in the comment is sufficient - this is useful for someone who has not been following to recreate relationships and dependencies between tasks. Thanks!

ami-day commented 4 years ago

@zperova yes, will do tomorrow morning

ami-day commented 4 years ago

4/5 datasets have passed validation; 1 more to go tomorrow morning hopefully

ami-day commented 4 years ago

@zperova I hadn't actually created tickets for converting each of the datasets for the r_release, that would have been useful, I will do in future and link them.

javfg commented 4 years ago

All datasets pass validation now, but are missing the Bundle UUID and Version columns. We are working on filling the information in those now.

The helper script needs more work, but once finished, it would be able to automate most of the steps.

lauraclarke commented 4 years ago

Defining the steps which need to be automated will be helpful for this task https://github.com/HumanCellAtlas/metadata-schema/issues/1242

zperova commented 4 years ago

@ami-day but you have created the tickets to review the converted datasets, right? it will be useful to link these here. Thanks!

ami-day commented 4 years ago

@zperova I created the tickets to review GEO -> HCA metadata conversion, I didn't create tickets for each of the HCA -> MAGE-TAB conversions (but should have)