Closed ami-day closed 3 years ago
New ontology terms have now been added:
EFO_0010728 | curettage EFO_0010727 | vacuum aspiration
Updated metadata sheet with new ontology terms: https://docs.google.com/spreadsheets/d/1iC_mH4zxDOvowWSVwmjb-IaWQNzWVGq5/edit#gid=982975591
Donor organism:
familial_relationship
child
and parent
fields.Collection protocol:
Specimen from organism:
Placenta_23 villi were processed on both Drop-seq and 10x platforms as the sample yielded enough cells
, so it looks like the sample came from the same specimen but yielded 2 different cell suspensions.Cell suspension:
ES-20201708
: https://drive.google.com/drive/folders/1ovvufKQicYRbDQPS4Gz1RosZ993Ic-l5UPDATE:
The new spreadsheet is under the folder with a timestamp for today
Still waiting for 'chorionic villus' ontology term to be added to HCAO.
@ESapenaVentura Thank you for your review, it looks good.
I made some small changes: I changed the donor organism, specimen and cell suspension ids, names and descriptions to be more human interpretable compared to the ids used in the paper supplements but the overall linking/design has not changed.
The requested HCAO ontology term still hasn't been added, but other than that the metadata validates in ingest.
Will upload to ingest production when the new term is available.
Latest spreadsheet here: https://docs.google.com/spreadsheets/d/1ONhuXHUu6NxAiEgsVcCSpU0CVpbRN02s5yzhteeK_eY/edit#gid=1328329976
The new ontology term has now been added and the project metadata is uploaded and valid in ingest (yay!). BUT, there is currently an issue with syncing fastq files to ingest prod. (not specific to individual projects). I have messaged the devs about this in slack.
Exported.
Preparing for SCEA with ID E-HCAD-23
This dataset has been exported to the Terra staging area.
Converted to SCEA: E-HCAD-23 & E-HCAD-24
An error in the file format for all the fastq files, the sequencing_file.file_core.file_format
field was exported with a leading .
. This meant none of the files were validated.
Upon removing the leading .
, 2 files were discovered to be invalid
P3D_DS_Placenta_21_S1_R1_001.fastq.gz
P1D_DS_Placenta_20_S1_R2_001.fastq.gz
I have downloaded these files to an hca-util upload area and will sync once it is confirmed these are the only invalid files. There were still a few files that got stuck in 'Validating'
We are aiming to re-export the updated submission for release 5 (April 26th cut off) once we have updates working.
I have now synced the files to the ingest upload area. Can @yusra-haider comment on whether I am now able to re-export the project?
to confirm, this is the project in reference: https://contribute.data.humancellatlas.org/projects/detail?uuid=1cd1f41f-f81a-486b-a05b-66ec60f81dcf right?
@aaclan-ebi seems like the data files for this project also follow the old naming scheme:
gs://broad-dsp-monster-hca-prod-ebi-storage/prod/1cd1f41f-f81a-486b-a05b-66ec60f81dcf/data/ee78b561-90e6-4c25-9830-c30eafd7e3e4_2020-12-08T09:38:13.224000Z_P5D_DS_Placenta_22_S1_R2_001.fastq.gz
should we delete the exported files in terra and then re-export to avoid duplicate data files in terra staging area, for this project?
yep that's the right project.
@yusra-haider's proposal sounds sensible. @aaclan-ebi are you able to confirm?
Sorry, i overlooked this in my email. Yes, that sounds good!
@yusra-haider , we should delete /data & /descriptor directories before reexporting/resubmitting.
ok let me know when I am able to click 'submit' for this project @yusra-haider .
deleted the project in terra staging area by using this command:
gsutil rm -r gs://broad-dsp-monster-hca-prod-ebi-storage/prod/1cd1f41f-f81a-486b-a05b-66ec60f81dcf
@mshadbolt you can go ahead and submit now
I have hit submit, but due to the known bug where updates go directly to exported, @MightyAx are you able to check in a few hours whether the metadata and data successfully exported to the directory above that was previously deleted?
I'll then submit the import request form once I have confirmation of export.
I checked the bucket and all files seemed to export correctly so I have submitted the request for import form for the updated project.
Very old description and comments about this dataset progress can be found here: https://github.com/HumanCellAtlas/hca-data-wrangling/issues/247
BUT this study has now been published, here: https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/30402542/. I used the publication metadata to create the metadata spreadsheet.
Status: The metadata sheet has been completed.
This project still requires:
[x] Secondary review (@ESapenaVentura)
[x] Validation with graph validator
[x] Upload of metadata sheet to ingest prod. for validation
[x] Upload of raw data to ingest prod. (transfer of fastq from public archives)
Here is the metadata spreadsheet: https://docs.google.com/spreadsheets/d/1ONhuXHUu6NxAiEgsVcCSpU0CVpbRN02s5yzhteeK_eY/edit#gid=1328329976