Open idazucchi opened 2 years ago
I worte to the authors (thread) to get an accession for the published data.
GSE138669 Contains the 6 healthy samples from this publication Contains 4 additional healthy samples and 12 diffuse cutaneous systemic sclerosis --> all samples are part of a newer publication
It makes more sense to wrangle just the newer publication with the accessioned data since they come from the same lab --> I'm working on that
I will let Maria know about the additional 4 healthy skin samples since they are intrested in healthy samples the most
Bam-to-fastq The files are in bam format so I've converted them using the bamtofastq tool in the ec2. Some things don't look right:
10x v1 files are split as follows:
10x v2 - three I1 files have no reads - I've run the conversion twice and got the same result, so I think that this no due to the conversion going wrong
@E00440:237:HM3CLCCXY:7:2116:30858:65177 2:N:0:0
@E00440:237:HM3CLCCXY:7:1112:6258:17008 2:N:0:0
+
@E00440:237:HM3CLCCXY:7:1208:1763:16147 2:N:0:0
I'm not sure the issues with the fastq files can be solved in time for the release
@ami-day has some prior experience with this problem.
Analysis files There is on h5 file per donor but the contents of those 22 h5 files are identical, so I think it might be one integrated analysis file for all donors. There is not enough time to confirm this with the authors and if I submit the files now it could prove difficult to delete the wrong files from the DCP.
I'll contact the authors again and hopefully get h5ad files with more metadata as well
Graph valid!
I've omitted the 3 empty I1 files and the analysis files/protocol
This looks really good to me, nice work!
I can't see any issues other than, I am not sure why the Analysis Protocol and File tabs were not included, given they are available in GEO (e.g. project level: GSE138669_RAW.tar and sample level: GSM4115868_SC1raw_feature_bc_matrix.h5).
I'm exporting the dataset
This will need to be updated to add analysis protocol and files - I'll do it as soon as I can confirm that the h5 files are different for each donor Both the size (737’280 cells and 33’538 genes) and the fact that all files have the same dimensions is suspicious The total number of cells reported in the paper is 65'199 - even taking into account quality checks it still looks like too many cells
Verified in the data browser!
There's a problem here with spreadsheet generation for this dataset (stuck for indefinite time). @idazucchi will also create a bug ticket for Dev
I've added a new submission (spreadsheet here) with the analysis files. The linking to the existing entities was done using the uuids and checked manually because the spreadsheet cannot be generated.
I'm now exporting
There are 3 publications referencing the same data.
We started from 1, then authors pointed us to 2, and now bionetwork references 3 in their list (through the data portal tracker).
Publications 2 and 3 reference the same GEO accession GSE138669 and the same donor_ids in the figures. It's a different analysis of the same data.
For consistency among other components, the project title in the tracker is Myofibroblast transcriptome indicates SFRP2hi fibroblast progenitors in systemic sclerosis skin
3 is added to the project, but title has not been changed.
PS I've already added the data_use_restriction field & bump the project version for this project.
@arschat did you also export the changes?
No, I've not exported the changes. I asked Dave about this but did not get a reply on that question specifically.
10.1038/s41467-021-24607-6 7f351a4c-d24c-4fcd-9040-f79071b097d0 both publications point to the same GEO accession (GSE138669) and reference the same donor IDs. GEO also, points back to the second publication (10.1002/art.41813), and that's why we decided to mention this publication in the project. The difference between the two studies is the analysis that was done along with non-sequencing experiments, but authors did not share any integrated objects that might be different between the two analysis (only fastq & raw count matrices). We could add the first publication (10.1038/s41467-021-24607-6) as well, but let me know if you would like us to change the title as well.
@arschat change title to be consistent with tracker. export project only
Myofibroblast transcriptome indicates SFRP2hi fibroblast progenitors in systemic sclerosis skin
"hca_bionetworks": [
{"schema_version": "1.0.1",
"name": "Skin",
"atlas_project": false}
"data_use_restriction": "NRES"
Ready to export metadata only @idazucchi
project metadata exported!
from the gpc bucket
4500 2023-07-27T05:56:35Z gs://broad-dsp-monster-hca-prod-ebi-storage/prod/7f351a4c-d24c-4fcd-9040-f79071b097d0/metadata/project/7f351a4c-d24c-4fcd-9040-f79071b097d0_2022-08-25T14:22:53.860000Z.json
4589 2024-09-27T16:04:11Z gs://broad-dsp-monster-hca-prod-ebi-storage/prod/7f351a4c-d24c-4fcd-9040-f79071b097d0/metadata/project/7f351a4c-d24c-4fcd-9040-f79071b097d0_2024-09-26T14:30:26.573000Z.json
import form filled out
Project short name:
SkinSystemicSclerosis
Primary Wrangler:
Ida
Secondary Wrangler:
Ami
Associated files
Published study links
Initial Paper: SFRP2/DPP4 and FMO1/LSP1 Define Major Fibroblast Populations in Human Skin
Paper with accessioned data: Expansion of Fcγ Receptor IIIa–Positive Macrophages, Ficolin 1–Positive Monocyte-Derived Dendritic Cells, and Plasmacytoid Dendritic Cells Associated With Severe Skin Disease in Systemic Sclerosis
Accessioned data: GSE138669
Ingest: project
Key Events