Closed zperova closed 4 years ago
@javfg @ami-day Can you update this ticket to reflect progress and add remaining steps?
@javfg @ami-day Can you update this ticket to reflect progress and add remaining steps?
@lauraclarke Yes, I just ticked the above to reflect the conversion of 5 datasets. However, as Javier worked on idf files and I worked on sdrf files, we are in the process of integrating our work for the 5 datasets. At this point we can again review the extent to which automation is possible and accurate.
Today we plan to get the validator running on an example dataset as a test individually, and we are going to meet this Monday to run the validator on the 5 datasets together based on our test experience so we can determine the cause of any errors and discuss how to resolve them.
Also just as a comment: the converter wasn't actually used, and having looked at some output files from the converter, it doesn't make sense to me to use it. It seems it will be much more convenient for us to further develop Javier's script based on our experience. But we could discuss once done.
We aren't obliged to use any tools in particular. Just make sure we use suitable tools to deliver on needs
@ami-day feel free to modify the body of the issue when needed. As we move on with the task some initial solutions might not work. It is good to track in the comments the process and document any changes from the initial plan for reference.
@ami-day could you please also link the tickets with the datasets that you have worked on to this ticket? A mention in the comment is sufficient - this is useful for someone who has not been following to recreate relationships and dependencies between tasks. Thanks!
@zperova yes, will do tomorrow morning
4/5 datasets have passed validation; 1 more to go tomorrow morning hopefully
@zperova I hadn't actually created tickets for converting each of the datasets for the r_release, that would have been useful, I will do in future and link them.
All datasets pass validation now, but are missing the Bundle UUID and Version columns. We are working on filling the information in those now.
The helper script needs more work, but once finished, it would be able to automate most of the steps.
Defining the steps which need to be automated will be helpful for this task https://github.com/HumanCellAtlas/metadata-schema/issues/1242
@ami-day but you have created the tickets to review the converted datasets, right? it will be useful to link these here. Thanks!
@zperova I created the tickets to review GEO -> HCA metadata conversion, I didn't create tickets for each of the HCA -> MAGE-TAB conversions (but should have)
Description
As a HCA wrangler to is responsible for delivering correct data and metadata for the March release of HCA data, I would like to convert HCA spreadsheets to facilitate data flow of the HCA data to SCEA in time for March release deadeline.
Deadline: 6 Mar 2020 - to be confrimed with Pablo
Acceptance Criteria
[x] 2 datasets have been manually curated to MAGE-TAB
[x] 3 datasets have been run through the converter and manually curated to MAGE-TAB
[x] determine which steps of manual curation can be automated @javfg
[x] all datasets have been run through validation scripts and uploaded into github repo for SCEA processing