Open idazucchi opened 11 months ago
possible additional projects: 9 - stalled due to missing linking information 32 - No data, no metadata, yes publication, no reply 33 - No data, no metadata, yes publication, no reply 37 - stalled due to missing metadata information 39 - No data, no metadata, yes publication, no reply 42 - No data, no metadata, yes publication, no reply
Priority: only after lung is done and Peng's ArrayExpress submission Action: communicate with Peng to confirm list and prototype of the metadata file @arschat
Arsenios to prepare a list of project that can easily be flattened and communicate that to Peng
Based on the list of analysis files each project has, here is the availability for extracting the cell barcodes of each of the kidney datasets.
availability | number of projects |
---|---|
yes | 22 |
no | 1 |
no metadata available | 8 |
lattice | 2 |
I have downloaded for each project the analysis file that contains barcode information & sample ID and verified that the mapping between provided sample_ID and cell_suspension is identical or can be mapped.
Specifically across all kidney projects | Number | Project | In Core Kidney List | Ingest Status | Short Name | UUID | extract |
---|---|---|---|---|---|---|---|
37 | Kuppe et al. | Included | key metadata missing | Kramann-Human-Smartseq2 | 128952c1-1906-4746-b4dd-6a10d1ff52d0 |
no metadata available | |
1 | Lake et al. | Included | Published in DCP | NichesHumanKidney | lattice | ||
19 | Krebs et al. | Core data | Published in DCP | PathogenInducedResidentMemory | dc0b65b0-7713-46f0-a339-0b03ea786046 |
yes | |
9 | He et al. | Included | key metadata missing | Patrakka-Human-Smartseq2 | 662157e4-ba53-4766-975a-ac11920f153e |
no metadata available | |
14 | Arazi et al. | Included | Published in DCP | Hacohen-Human-CELseq2 | 2d559a6e-7cd9-432f-9f6e-0e4df03b0888 |
yes | |
16 | Der et al. | Included | In progress | TubularCellLupusNephritis | 97fca723-d9e9-4263-9f67-335416086f47 |
no metadata available | |
36 | Der et al. | Included | Published in DCP | Der-Human-LupusNephritis-Nextera-C1 |
4627f43e-a43f-44dd-8c4b-7efddb3f296d | yes | |
43 | Yu et al. | Included | Published in DCP | Xiao-Human-RNAscope | 5f44a860-d96e-4a99-b67e-24e1b8ccfd26 |
yes | |
38 | *Menon et al. | Core data | Published in DCP | Menon-Human-FSG-10x3 | 29b54165-34ee-4da5-b257-b4c1f7343656 |
yes | |
47 | McEvoy et a. | Included | Published in DCP | KidneySexBasedTranscriptome | 77c13c40-a598-4036-807f-be09209ec2dd |
yes | |
33 | Cowman et al. | Included | No data/ no metadata | MacrophagePrognosticIndicator | 0aeaaab8-3e48-4877-a244-70d0dedc66cd |
no metadata available | |
17 | Zheng et al. | Included | Published in DCP | IgANephropathySTRT | 2caedc30-c816-4b99-a237-b9f3b458c8e5 |
yes | |
32 | Chen et al. | Included | No data/ metadata | SurveyHumanGlomerulonephritis | 0057c36c-06ce-4cdf-bff4-533ad13f090c |
no metadata available | |
39 | Meng et al. | Included | No data/ metadata | Ma-Human-10xtechnology | 1bef1065-6e7d-4235-8a8d-535717d8d1e1 |
no metadata available | |
42 | Huang et al. | Included | No data/ metadata | ? | eeff6c81-f29f-4e54-b33f-3c825b605d42 |
no metadata available | |
44 | Zhao et al. | Included | No data/ metadata | Wu-Human-10x3pv2 | fa9f9bf1-62d6-4db1-9d36-8cef8806d6bf |
no metadata available | |
45 | Abedini et al. | Included | Published in DCP | KidneyFibroticMicroenvironment | e925633f-abd9-486a-81c6-1a6a66891d23 |
yes | |
23 | Young et al. | Included | Published in DCP | Haniffa-Human-10x3pv2 | d8ae869c-39c2-4cdd-b3fc-2d0d8f60e7b8 |
yes | |
41 | Suriawanshy et al. | Core data | Published in DCP | SuryawanshiKidneyAllografts | 6e522b93-9b70-4f0c-9990-b9cff721251b |
yes | |
34 | *Malone et al. | Core data | Published in DCP | ChimerismKidneyTransplantReject | 4ef86852-aca0-4a91-8522-9968e0e54dbe |
yes | |
35 | Chu et al. | Included | Published in DCP | Cheng-Human-10x3pv3 | ee166275-f63a-4864-8155-4df86c9de679 |
yes | |
30 | Obradovich et al. | Included | Published in DCP | Califano-Human-10x3pv2 | 95d058bc-9cec-4c88-8d2c-05b4a45bf24f |
yes | |
31 | Krishna et al. | Core data | Published in DCP | ImmuneLandscapeccRCC | 12f32054-8f18-4dae-8959-bfce7e3108e7 |
yes | |
2 | Stewart et al. | Included | Published in DCP | KidneySingleCellAtlas | abe1a013-af7a-45ed-8c26-f3793c24a1f4 |
yes | |
3 | Liao et al. | Core data | Published in DCP | HumanAdultKidneyLiaoMo | 2ef3655a-973d-4d69-9b41-21fa4041eed7 |
no | |
6 | Wilson et al. | Core data | Published in DCP | Diabetic Nephropathy snRNA-seq | 577c946d-6de5-4b55-a854-cd3fde40bff2 |
yes | |
20 | Tabula Sapiens | Included | Published in DCP | tabulaSapiens | 10201832-7c73-4033-9b65-3ef13d81656a |
yes | |
22 | Muto et al. | Core data | Published in DCP | lattice | |||
24 | Wu et al. | Core data | Published in DCP | GSE118184KidneyOrganoid | 16ed4ad8-7319-46b2-8859-6fe1c1d73a82 |
yes | |
40 | Borcherding et al. | Included | Published in DCP | ImmuneRenalCarcinoma | 955dfc2c-a8c6-4d04-aa4d-907610545d11 |
yes | |
18 | Tang et al. | Included | Published in DCP | Tang-Human-FluidigmC1basedlibrarypreparation | c5b475f2-76b3-4a8e-8465-f3b69828fec3 |
yes | |
21 | Han et al. | Included | Published in DCP | HumanCellLandscape | 1fac187b-1c3f-41c4-b6b6-6a9a8c0489d1 |
yes | |
25 | Zhang et al. | Core data | Published in DCP | RenalTumorMicroenvironment | 7c599029-7a3c-4b5c-8e79-e72c9a9a65fe |
yes |
Next action, draft an email to show Peng the list of projects available for extraction. Ask Peng if a merged csv file with cells in each row, project name in column and all desired metadata in other columns, works for them. Ask Peng how they would like to name the cell_names where we only have the barcode (one analysis file per CS).
If they have generate all count matrices from fastq files and did not extract information from contributor matrix, this would be very important.
On meeting on 21 Dec 23 Peng asked us if we could provide h5ads with the raw counts and all DCP metadata in the obs. The flat csv file works for them too but prefers the ready to integrate h5ad with the obs. Also, Peng updated us about the integration efforts stage, and the rich metadata are going to be needed in later stages, so we can have this in low priority.
Action items that were decided were:
After the investigation for the number of datasets that have merged anndata/seurat analysis files, the following stats came up (spreadsheet).
Analysis files | Count |
---|---|
Unmerged | 9 |
Semi-merged | 5 |
Merged | 4 |
No Analysis Files | 5 |
Unmerged -> 1 CS per File
Semi-Merged -> multiple CS per File but not all CS per File
Merged -> all CS in 1 File
There were some datasets that did not have analysis files, although we could provide the metadata at the CS level for all datasets (including HumanAdultKidneyLiaoMo
that previously was tagged as unavailable. It has been wrangled as pooled analysis files, although in GEO & in paper a direct Sample to each File is mentioned)
All 4 merged datasets have now the csv files that is a combination of contributor metadata & all DCP metadata & cell barcode. I have uploaded all of them in the drive folder that was mentioned before. (Some files are very big, google sheets might take a while to open them). Xiao-Human-RNAscope has only 1 CS in the entire project, therefore, we will not share it as an example with the kidney integration team.
Haniffa-Human-10x3pv2 ImmuneLandscapeccRCC Xiao-Human-RNAscope IgANephropathySTRT
first 3 flat files sent to Peng, he asked if we can merge metadata + raw cell counts and merge multiple analysis files clarify if Peng is interested in unmerged flat files
Peng replied that they are interested in the flat csv per analysis file, and that they would like to add a bare barcode column too. Peng said that he is leaving by the end of March, so a deadline in the middle of March might be reasonable. Need to discuss with Gabs.
Flat metadata at the sample level for all datasets that have analysis files & spreadsheet in DCP has been deposited in this folder.
Next steps:
Ticket downprioratized for #1256
What is done:
However, Peng later asked about Tier 1 metadata at the CS level instead of flat files in the cell barcode in #1256. This experiment is complete. The difference between the two tasks is the merge of the flat_CS metadata with the barcodes, which although it was discussed internally to have that as an option, we currently do not have any request for that.
This ticket can now close.
Description of the task: We agreed to provide flattened metadata sheets for the included project list of the Kidney bionetwork - see here
The elegible projects ultimately are:
1 Lake et al. https://www.biorxiv.org/content/10.1101/2021.07.28.454201v1.full [biorxiv.org]Lattice 14 Arazi et al. https://www.nature.com/articles/s41590-019-0398-x [nature.com] 16 Der et al. https://www.nature.com/articles/s41590-019-0386-1 [nature.com] 36 Der et al. https://insight.jci.org/articles/view/93009/pdf [insight.jci.org] 43 Yu et al. https://www.frontiersin.org/articles/10.3389/fmed.2022.869284/full#h7 [frontiersin.org] 47 McEvoy et a. https://www.nature.com/articles/s41467-022-35297-z [nature.com] 17 Zheng et al. https://www.sciencedirect.com/science/article/pii/S221112472031514X?via%3Dihub [sciencedirect.com] 45 Abedini et al. https://doi.org/10.1101/2022.10.24.513598 [doi.org] 23 Young et al. https://www.science.org/doi/10.1126/science.aat1699 [science.org] 35 Chu et al. https://www.frontiersin.org/articles/10.3389/fonc.2021.719564/full#h3 [frontiersin.org] 30 Obradovich et al. https://www.cell.com/action/showPdf?pii=S0092-8674%2821%2900573-0 [cell.com] 2 Stewart et al. https://www.science.org/doi/10.1126/science.aat5031 [science.org] 20 Tabula Sapiens https://www.science.org/doi/10.1126/science.abl4896 [science.org] 40 Borcherding et al. https://www.nature.com/articles/s42003-020-01625-6 [nature.com] 18 Tang et al. https://www.frontiersin.org/articles/10.3389/fimmu.2021.645988/full [frontiersin.org] 21 Han et al. https://www.nature.com/articles/s41586-020-2157-4 [nature.com]A few more projects might be included based on converation with Peng To add:
Acceptance criteria for the task: