ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

DCP to Tier 1: kidney experiment #1256

Open arschat opened 6 months ago

arschat commented 6 months ago

Kidney bionetwork asked us to provide all the Tier 1 fields that we already have, to help contributors fill the Tier 1 fields.

We decided to try to demo that on Krishna et al (#996) to see how long this would take and how difficult this would be.

At the first level we would populate this Tier 1 spreadsheet with our metadata and hand this spreadsheet to Peng, and he will pack that into an h5ad.

arschat commented 6 months ago

Repo with the two notebooks and an example here arschat/dcp_to_tier1

arschat commented 6 months ago

Tier 1 metadata spreadsheets have been produced here and shared with Peng. Waiting for feedback.

idazucchi commented 5 months ago

Kidney network decided to send empty templates but Peng is still interested in them this is on hold until we have clarity

idazucchi commented 5 months ago

Arsenios sent the spreadsheets for all the core datasets - this task is done although Peng did not provide feedback

arschat commented 5 months ago

Flat kidney core files here. Peng feedback was the following:

Thank you a lot for filling the tables.

They look incredibly good! The filled information sometimes looks too good to be true.

Is it because the authors previously had already submitted lots of information to your team?

And for places that could be potentially improved, I think library_id_repository can be inferred since library_sequencing_run may already contain the SRX IDs (at least for Liao’s data) - It may or may not work for all datasets;

In addition intron_inclusion could also be inferred based on alignment_software though some users didn’t use the default settings. But a safer way is to let the authors review prefilled contents.

Otherwise, I think these are already good enough to be sent to the authors to fill.

Matthias Kretzler (Kidney Bio Network Coordinator) is going to send out emails to the authors of the core datasets asking for metadata. I’ll then send follow-up emails including these tables for them to fill in case they don’t like making .h5ad anndata files.

This task is complete and ticket can close.

arschat commented 1 week ago

Lucia re-opened the task. I used the updated notebook for this and produced tier 1 metadata for 11 of the 16 dataset in Lucia's tracking sheet (copy). 2 is lattice, 2 is not wrangled and 1 is unpublished.

In this folder are the files that were sent to Lucia (concatenated and separately).