Closed ESapenaVentura closed 5 months ago
The only available data are cell by gene count matrix
For managed access data from healthy donors h5da files are available [here ] (https://www.covid19cellatlas.org/index.healthy.html) (Vieira Braga):
nasal + bronchial are consistent with what is reported in the paper and also what is reported in the supplementary materials, provided that each specimen derived from a different donor.
I curated the spreadsheet for the GEO set, will add the others
Source | Publication name | Source name | Tissue | #Donors | library prep |
---|---|---|---|---|---|
GEO | Lung resection | Lung resection | Lung lobe | 4 | drop-seq |
covid19cellatlas | Lung trasplant | Parenchyma | Parenchyma | 6 | 10x |
covid19cellatlas | Bronchoscopy biopsy | Nasal | Upper airways | 2 | 10x |
covid19cellatlas | Bronchoscopy biopsy | Bronchi | Bronchioli | 6 | 10x |
covid19cellatlas | Bronchoscopy biopsy | Bronchi | Lung brush | 3 | 10x |
The IDs of the Bronchoscopy donors from the supplementary materials don't match those coming from available data. They have been matched up using the information on cell count available in the supplementary material, a brilliant idea from @Wkt8. Otherwise I might have tired to ask the authors but they might have been unable to give me that information due to privacy concerns.
Missing donor metadata Donor ARMS052 has no metadata available, despite being included in the dataset and cell count information from the supplementary materials. I will contact the authors to determine whether the metadata is unavailable intentionally or whether the ID is incorrect.
Now that the picture of the donors and their specimen is clear I can update the metadata spreadsheet with the new information.
A few details are missing from the paper:
I still have to fill in the analysis tab.
For the nasal and lung brush no dissociation protocol is explicitly stated, I'm assuming that the bronchoscopy dissociation protocol covers these specimens as well.
Waiting for authors' reply
I fixed a number of errors in the metadata that showed up in ingest and reorganised the analysis protocol and files tabs.
To do:
The authors confirmed that the data published in the covid19cellatlas is exclusively 10x with corrections already applied. I updated the Data available table to summarise the information
The authors have metadata for ARMS052, however it is not published anywhere at this moment. I emailed them explaining that we cannot use the metadata if it is not publicly available. If they are able to update the publication's supplementary material then I will update the HCA project, otherwise I've filled out the minimum required fields.
@wei assigned as sec reviewer.
Looks good! Well done - I especially like how neat and informative the dissociation protocols are.
Just a couple things: Specimen_from_Organism: specimen_from_organisms for the transplants have a typo 'translpant' instead of 'transplant' Similarly the linking from CS to spec_from_organism will also need to be changed!
Cell_suspension: This isn't necessary but if you wanted the estimated_cell_count to show on the project you'd need to put in the cell counts at the cell suspension level which are technically available by summing the different cell types from the supplementary table S1
Analysis_File File source in Analysis_File needs to be added in. 'GEO' for the GEO ones. I would use 'Publication' for the covid19cellatlas ones.
There's also the lingering question (depending on what the contributor says) about if you are going to keep or delete ARMS052!
Thanks for reviewing! I applied the changes you suggested and I am exporting the project.
donor ARMS052 The authors told me that they are working on a new manuscript that will include part of the data I wrangled for this project and new data (some of it coming from donors from this wrangled publication). I'm opting to keep donor ARMS052 in the project with the minimal metadata available right now, since from what I understood we are not too concerned with donors being duplicated in the HCA. I hope I can update with additional metadata when they become available through a the new publication the authors spoke of
Donor ARMS052 (748aab09-0dc1-4dd1-bda5-dbc29c86cafb contains age units but no age information.
From what I understand reading the thread above, this information is not available AI:
@idazucchi to update the donor.
@ESapenaVentura created the ticket to update the graph validation to check unit and donor age.
the dataset was exported as part of release 13 but during the indexing there were some errors and we found out that there was a metadata error with donor ARMS052 (see Enrique's comment above). I fixed the mistake, re-exported for release 14, and marked it for release 14 in ingest and zenhub
Enrique spotted that some contributor had n/a
as institution. I've fixed the project metadata and re-exported it (metadata only)
verified in the data browser!
verified in the data browser!
vieira19_Alveoli_and_parenchyma_anonymised.processed.h5ad
is corrupted, needs to be swapped out
I've swapped the corrupted file.
Updating file metadata I tried deleting the file metadata (checksum, cloudUrl, size) via api - no error message but the metadata was still there I tried syncing the file from an hca-util area --> this effectively swapped the file and updated the file metadata
Export First export - new file was exported but the file descriptor still had the old file's metadata and export date was still 2023 I deleted the relevant file descriptor and re-exported
gsutil ls -l gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c0518445-3b3b-49c6-b8fc-c41daa4eacba/descriptors/analysis_file/
578 2023-07-27T06:01:18Z gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c0518445-3b3b-49c6-b8fc-c41daa4eacba/descriptors/analysis_file/aaa051e9-6f3a-4461-a8cc-adb3d84e13f2_2022-01-11T15:17:55.983000Z.json
560 2023-07-27T06:01:22Z gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c0518445-3b3b-49c6-b8fc-c41daa4eacba/descriptors/analysis_file/abcff8d9-8624-4dbc-a963-58edc994f336_2022-01-11T15:17:55.919000Z.json
576 2023-07-27T06:01:22Z gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c0518445-3b3b-49c6-b8fc-c41daa4eacba/descriptors/analysis_file/c8669817-1d49-4f1c-850e-f921ac5d6db0_2022-01-11T15:17:56.010000Z.json
593 2024-03-20T15:26:49Z gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c0518445-3b3b-49c6-b8fc-c41daa4eacba/descriptors/analysis_file/ee5b3cdf-26b1-4b9b-8a2b-100e0a33ef08_2022-01-11T15:17:55.965000Z.json
564 2023-07-27T06:01:15Z gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c0518445-3b3b-49c6-b8fc-c41daa4eacba/descriptors/analysis_file/f2bbb21c-9df9-4607-92e0-98ae5caa9927_2022-01-11T15:17:55.885000Z.json
554 2023-07-27T06:01:17Z gs://broad-dsp-monster-hca-prod-ebi-storage/prod/c0518445-3b3b-49c6-b8fc-c41daa4eacba/descriptors/analysis_file/ff7d5b00-aec6-47c8-b2c7-e2fa974ca46f_2022-01-11T15:17:55.947000Z.json
the descriptor content is updated, and the export date is correct - but the filename still has the date of the first export 2022-01-11
I think this is a bug
Filled import form
the file was not updated correctly - I'm investigating
File has been fixed following this SOP. This change has been verified in the browser.
Note that in browser, the matrices tab does not show any file although we have analysis files however, we can access files either with the download tab or via filtering the specific project in explore.
Project short name:
lungCellularCensus
Primary Wrangler:
Ida
Secondary Wrangler:
Associated files
Published study links
Paper: https://doi.org/10.1038/s41591-019-0468-5
Accessioned data:
Ingest https://contribute.data.humancellatlas.org/projects/detail?uuid=c0518445-3b3b-49c6-b8fc-c41daa4eacba
Key Events