ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

GSE132065 - GSE132065_BloodTimeHuman #495

Open ipediez opened 3 years ago

ipediez commented 3 years ago

Project short name:

GSE132065_BloodTimeHuman

Primary Wrangler: @ipediez

Enrique

Secondary Wrangler:

Associated files

Published study links

Ingest

Key Events

ipediez commented 3 years ago

Some files are not downloading from ENA through move_data_from indsc.py and I'm having trouble downloading them via wget. I have tried:

None of the methods above seems to work, even the cloud delivery is yielding an error. I have sent an email to SRA asking about the error.

ipediez commented 3 years ago

I'm waiting for SRA to solve their problem with cloud delivery, as it seems to be not working for anyone. Meanwhile, I'm trying to get via wget the files that I'm missing.

ipediez commented 3 years ago

Ready for secondary review by @ESapenaVentura

gabsie commented 3 years ago

will be reviewed todayby @ESapenaVentura

ipediez commented 3 years ago

@ESapenaVentura to work on this today

jacobwindsor commented 3 years ago

@ESapenaVentura reviewed and @ipediez to make changes

ESapenaVentura commented 3 years ago

I have reviewed the dataset! A couple of notes:

Donor organism

Collection protocol

Specimen

 Enrichment protocol

For the cell lines that had T-Cell activation, maybe it’s worth to add the activation as an enrichment protocol?

For T cell activation, cells were thawed in MACS buffer (1X PBS, 4% FBS, 2 mM EDTA), centrifuged during 5 min at 700×g and RT, and resuspended in pre-warmed culture media (RPMI, 1% Pyruvate, 20% FBS, Pen/Strep, DNase 100 U/ml). A TC20™ automated cell counter was used to assess cell number and viability. The number of only viable cells was used to calculate volumes for cell seeding. For each condition, 200,000 live cells were seeded into two wells of a 96-well round bottom plate (Sigma Aldrich) for a total of 400,000 cells per condition (time point). Dynabeads Human T-Activator CD3/CD28 (Thermo Fisher Scientific) were transferred to a 1.5-ml tube (5 μl/well), washed twice with 1 ml of cell culture media, and resuspended with 10 volumes of cell culture media. Fifty microliters of resuspended beads was added to each well for T cell activation and expansion. Cells were incubated during 24 h at 37 °C with 5% CO2 and 5% humidity. The remaining cells (~ 350,000 cells per condition) were used as a control (day 0) for T cell activation. Cells subjected to T cell activation protocol were collected in a 1.5-ml tube and stained with DAPI (Thermo Fisher Scientific) at 1 μM final concentration. DAPI-negative live individual cells were sorted with a BD FACSAria™ Fusion Flow cytometer (BD Biosciences) in 1X PBS supplemented with 0.05% BSA.

Not super sure where the T-Cell activation would fall though, but I think it can be considered an enrichment protocol?

Cell line

Cell suspension:

Library preparation protocol

Analysis protocol

ipediez commented 3 years ago

New term requested: T cell activation

ipediez commented 3 years ago

The new term is EFO:0030037 (T cell activation assay). It will be in HCAO following the release scheduled for Tue 16th.

ESapenaVentura commented 3 years ago

I have reviewed the dataset again, this time to the very end :) First of all, congrats, this is a very big dataset and I think you've done an outstanding job on describing the experimental design!

I have just a couple of comments and corrections:

Enrichment protocol

Cell line

Cell suspension

Sequence file

gabsie commented 3 years ago

@ipediez still needs to upload the files.

ipediez commented 3 years ago

Graph validator status: pending since 12:00

ipediez commented 3 years ago

Graph validator yields three errors that shouldn't be there:

The last two errors are related with: https://github.com/ebi-ait/ingest-graph-validator/issues/50

ESapenaVentura commented 3 years ago

About the errors, I have discovered that they will be solved once the changes in the error messages are deployed into production, except contains_umi_barcode_info.adoc, which will need to be addressed

ESapenaVentura commented 3 years ago

contains_umi_barcode_info.adoc fix: https://github.com/ebi-ait/ingest-graph-validator/pull/51

gabsie commented 2 years ago

hey @ipediez and @ESapenaVentura in your absence, data browser team has requested that somebody sense checks this project in the browser: https://data.humancellatlas.org/explore/projects/5b328561-4a97-40ac-b7ad-6a90fc59d374?catalog=dcp12

they are surprised by the 3-level subgraph structure. can you confirm all looks good?

ESapenaVentura commented 2 years ago

There was a discussion around this dataset - Moving to needs update and needs some further discussion

Wkt8 commented 2 years ago

Update was based on whether or not specimens could come from specimens. Data is not technically incorrect, but does not conform to how UCSC and us want to wrangle this. Not advised to do this manually. Either - new submission via bulk updates with deletion of entities. Useful to check if linkings can be changed via bulk updates. Or - soft delete of project with a reupload using the same project uuid.

Wkt8 commented 2 years ago

Check with alegria as she has done this before. Communicate on the dcp-ops channel about deleting biomaterials, and together with a dev to delete entities from the staging area.

MightyAx commented 2 years ago

Previously submitted entities can't be deleted but they can be ignored / orphaned.

Wkt8 commented 2 years ago

Waiting for @ESapenaVentura to return before we discuss this further.

ofanobilbao commented 2 years ago

@ESapenaVentura @Wkt8 did we reach any conclusions on this?

ESapenaVentura commented 2 years ago

Waiting on implementation on delta staging areas because deletion of entities is necessary

ofanobilbao commented 1 year ago

https://app.zenhub.com/workspaces/operations-5fa2d8f2df78bb000f7fb2b5/issues/ebi-ait/hca-ebi-wrangler-central/912 - Is blocking any updates to this project in Ingest, including adding the info in this ticket in order to close this ticket while we wait for this functionality

ofanobilbao commented 1 year ago

I have managed to add the required notes (including reference to this ticket in Ingest back office notes) so I am closing this until further notice

Wkt8 commented 1 year ago

Hannes has commented that there are fastqs listed as analysis_file entities which are breaking processes in Azul ticket

idazucchi commented 1 year ago

checking if this project can be updated to correct the fastq file issue or if it would require a soft deletion this could be a chance to correct the other modeling issues with this project (specimen to specimen linking, fastq files as input for analysis files)

ESapenaVentura commented 1 year ago

There is an issue (Defined here) regarding the UMI files being described as analysis files.

The main issue here is that the UMI fastq files were defined as analysis files, which is breaking the browser.

I am investigating what are our options

ESapenaVentura commented 1 year ago

Regarding the issue in the previous comment:

Based on what we decide it's the best solution, we can start working on the problem