phs001457.v1.p1 - The immune cell landscape in kidneys of patients with lupus nephritis

ofanobilbao commented 1 year ago

Project short name:

Hacohen-Human-CELseq2

Primary Wrangler:

Arsenios

Secondary Wrangler:

Anu

Key Events

[x] Convert published metadata to HCA spreadsheet
[x] Manually curate dataset to meet HCA metadata standard
[x] Collect any matrix and cell-type annotation files
[x] Upload sheet to validate metadata
[x] Transfer raw files to ingest to validate data files
[x] Check linking using ingest graph validator
[x] Ask the Secondary Wrangler for an end-to-end review of the project. Ask the Expertise Wrangler to review specific tabs if needed
[x] Submit dataset to Production
[ ] Complete the Export SOP
[ ] Convert project data to SCEA format following the SCEA conversion SOP if appropriate

arschat commented 1 year ago

Living donors with managed access for raw (upon request) and processed data (have to create account to access).

amnonkhen commented 1 year ago

@idazucchi to assist

Wkt8 commented 1 year ago

Too many cell suspension that didn't fit in a single cell in excel - now added through the API.

gabsie commented 1 year ago

Error to be discussed with @amnonkhen today. Arsenios to paste it here:

arschat commented 1 year ago

Error in all six analysis files:

* should NOT have additional properties at root of document

Wkt8 commented 1 year ago

Arsenios and Wei to discuss the number of processes from biomaterials --> analysis files and then to add it via API.

arschat commented 1 year ago

The * should NOT have additional properties at root of document error had to do with outdated analysis_file schema in the schema tab. Fixed if schema tab is deleted (or have correct schema versions in schema tab).

Problem: add CS as input to multiple processes but with put request of processid/ to csid/inputToProcesses the process is replaced and no multiple processes can be added directly.

Solution: use patch request instead.

*attached script used for the api fix update_cs_to_process.py.zip

arschat commented 1 year ago

stuck in graph validation for ~24h

ESapenaVentura commented 1 year ago

Waiting whe dev capacity to investigate

arschat commented 1 year ago

Graph valid, ready for sec review.

anu-shiva commented 1 year ago

Donor information: Development stage- I have been using HsapDv identifiers. HsapDv:0000087

Collection protocol: Should we fill a blood collection protocol if we have information to link the blood collection protocol to other entities?

Specimen protocol: Column C – Specimen for organism description values is of donor with the same clinical ID. (Could it be that drag and drop copied cells and need correction?)

Dissociation protocol: Kidney_enzymatic_dissociation – I think this description includes mechanical dissociation and enrichment and could be modified. e.g. mechanical (Specimens were cut into 2–3 pieces and placed into a 1.5-ml centrifuge tube containing 445 μl Advanced DMEM/F-12 (ThermoFisher Scientific, catalog no. 12634-028) and 5 μl DNase I (Roche, catalog no. 04536282001, 100 U ml−1 final concentration). Then, 50 μl Liberase TL (Roche, catalog no. 05401020001, 250 μg ml−1 final concentration) was added, and the tube was placed on an orbital shaker (300–500 r.p.m.) at 37 °C for 12 min. At 6 min into the digestion, the mixture was gently pipetted up and down several times using a cut 1-ml pipette tip.) enrichment (The cells were washed with RPMI/10% FBS, centrifuged at 300g at 4 °C for 10 min and resuspended in cold PBS for downstream analyses.)

Enrichment protocol We could add in the description which cell types for which the enrichment was carried out– kidney/urine/blood? Markers- Does CD31- make it here?

Cell suspension: The dissociation protocol may need modification if you choose to add the mechanical_dissociation to the dissociation protocol sheet.

Library preparation: Library construction kit - retail name/manufacturer could be filled out? (You could add CEL-seq2 to Stanford cheatsheet)

Analysis protocol: cel_seq2_analysis- is it raw matrix or processed matrix generation as the description mentions normalised and log-transformed? I understand that ru10_analysis is only a filtering step. some references' numbers are mentioned in the description

e.g. In cel_seq2_analysis-McCarroll laboratory54 (there might be more cases too)

I am guessing we don't have Biosamples/INSDC accessions as they are not filled out anywhere, and the seq files are not available either?

arschat commented 1 year ago

Thank you for the review, @anu-shiva ! I included most of the comments in the spreadsheet, and here is some feedback, too.

Collection Protocol:

Although it is mentioned in the paper, the immport dataset does not include blood sequence or analysis files. The dataset includes the CEL-seq2 datasets and the two 10x, frozen needle core biopsies. It also includes SDY997_EXP15077_dbGAP files from another paper, which have been wrangled in another dcp project. That's the reason I did not include the blood sample preparation for in the exp design.

Dissociation protocol:

In the second part that were mentioned, there is no enrichment protocol based on the following list, only washing and centrifugation but not density gradient centrifugation.

Enrichment protocol:

Since blood samples are not included in the database, and the same enrichment protocols are used in all other sample types (urine & kidney) no sample type specification is needed.
The 3 markers are derived from the pdf file named AMP SLE Phase1 Final Sorting Plan.PTL10046.pdf in the protocols folder of the database, but I had a typo in the CD31, so thanks for pointing that out!

Sequence files:

Sequence files are available upon request from dbGAP and since there are non-public living donors' data, we can't access them.

gabsie commented 1 year ago

Deleting submission problem and action: Enrique to help Arsenios. delete processes and then delete submission through the API

gabsie commented 1 year ago

When @amnonkhen is back, he can help with deleting this submission.

gabsie commented 1 year ago

@ESapenaVentura is going to help with a workaround

amnonkhen commented 1 year ago

I ran graph validation. There is an error but it does not appear on the api. While I investigate why the error is missing from the api, I will record it here:

"23-04-27 11:41:49 [ingest_graph_validator.actions.test_action] - ERROR: test [no_orphans.adoc] failed: non-empty result.\n"
"23-04-27 11:41:49 [ingest_graph_validator.actions.test_action] - ERROR: result: [{'n': Node('analysis_file', 'file', describedBy='https://schema.humancellatlas.org/type/file/7.0.0/analysis_file', genome_assembly_version='GRCh37', id='SDY997_EXP15176_celseq_matrix_ru10_molecules.tsv', matrix_cell_count=8296, schema_type='file', self_link='http://172.20.59.73/files/641457c66efdc3220c3bb1af', uuid='380d306b-92b8-47d2-a459-29217beafae5', **{'file_core.checksum': '99e7ad41c1b2e2f664ff2c1f88d6d23a', 'file_core.content_description': \"[{'text': 'Count matrix', 'ontology': 'data:3917', 'ontology_label': 'Count matrix'}]\", 'file_core.file_name': 'SDY997_EXP15176_celseq_matrix_ru10_molecules.tsv', 'file_core.file_source': 'Publication', 'file_core.format': 'tsq'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['file', 'analysis_file']}, {'n': Node('analysis_file', 'file', describedBy='https://schema.humancellatlas.org/type/file/7.0.0/analysis_file', genome_assembly_version='GRCh37', id='gene_by_cell_exp_mat.txt', matrix_cell_count=2838, schema_type='file', self_link='http://172.20.59.73/files/641457c66efdc3220c3bb1b1', uuid='91d6b167-aa7f-46e7-823d-98ed205828db', **{'file_core.checksum': 'ee3df4ef4b7fc146d9ad813e88bec46b', 'file_core.content_description': \"[{'text': 'Count matrix', 'ontology': 'data:3917', 'ontology_label': 'Count matrix'}]\", 'file_core.file_name': 'gene_by_cell_exp_mat.txt', 'file_core.file_source': 'Publication', 'file_core.format': 'txt'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['file', 'analysis_file']}, {'n': Node('analysis_file', 'file', describedBy='https://schema.humancellatlas.org/type/file/7.0.0/analysis_file', genome_assembly_version='GRCh37', id='cluster_per_cell.txt', matrix_cell_count=2838, schema_type='file', self_link='http://172.20.59.73/files/641457c66efdc3220c3bb1b2', uuid='d63ffb1f-c986-4983-94c6-64644ae5d72e', **{'file_core.checksum': '6fa32ed4f3f39181ee067039cf846b25', 'file_core.content_description': \"[{'text': 'Clustered expression profiles', 'ontology': 'data:3768', 'ontology_label': 'Clustered expression profiles'}]\", 'file_core.file_name': 'cluster_per_cell.txt', 'file_core.file_source': 'Publication', 'file_core.format': 'txt'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['file', 'analysis_file']}, {'n': Node('analysis_file', 'file', describedBy='https://schema.humancellatlas.org/type/file/7.0.0/analysis_file', genome_assembly_version='GRCh37', id='raw_gene_bc_matrices_donor1.zip', matrix_cell_count=737280, schema_type='file', self_link='http://172.20.59.73/files/641457c66efdc3220c3bb1b3', uuid='3fed375a-815a-42f4-ba64-092b4e97e02c', **{'file_core.checksum': 'db2f47ab550b2f539bcfdffa68fec587', 'file_core.content_description': \"[{'text': 'Count matrix', 'ontology': 'data:3917', 'ontology_label': 'Count matrix'}, {'text': 'cell barcode', 'ontology': 'EFO:0010198', 'ontology_label': 'cell barcode'}, {'text': 'gene identifier', 'ontology': 'data:1025', 'ontology_label': 'Gene identifier'}]\", 'file_core.file_name': 'raw_gene_bc_matrices_donor1.zip', 'file_core.file_source': 'Publication', 'file_core.format': 'zip'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['file', 'analysis_file']}, {'n': Node('analysis_file', 'file', describedBy='https://schema.humancellatlas.org/type/file/7.0.0/analysis_file', genome_assembly_version='GRCh37', id='raw_gene_bc_matrices_donor2.zip', matrix_cell_count=737280, schema_type='file', self_link='http://172.20.59.73/files/641457c66efdc3220c3bb1b4', uuid='052e7489-9ace-4f33-981f-2b0f977b0cfc', **{'file_core.checksum': '8ac8e368de7748e481685d5fbecc927d', 'file_core.content_description': \"[{'text': 'Count matrix', 'ontology': 'data:3917', 'ontology_label': 'Count matrix'}, {'text': 'cell barcode', 'ontology': 'EFO:0010198', 'ontology_label': 'cell barcode'}, {'text': 'gene identifier', 'ontology': 'data:1025', 'ontology_label': 'Gene identifier'}]\", 'file_core.file_name': 'raw_gene_bc_matrices_donor2.zip', 'file_core.file_source': 'Publication', 'file_core.format': 'zip'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['file', 'analysis_file']}, {'n': Node('collection_protocol', 'protocol', describedBy='https://schema.humancellatlas.org/type/protocol/biomaterial_collection/9.2.0/collection_protocol', id='kidney_collection_protocol', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c36efdc3220c3bb1a0', uuid='9bf9e05f-31c4-4ed6-bdc5-afe6bb52b058', **{'method.ontology': 'EFO:0009120', 'method.ontology_label': 'biopsy', 'method.text': 'biopsy', 'protocol_core.protocol_description': 'Research biopsy cores were collected from consented subjects either as an additional biopsy pass obtained specifically for research during a clinically indicated biopsy procedure (nine sites), or as a portion of a biopsy specimen acquired for diagnostic pathology during a clinically indicated biopsy procedure (one site). Control kidney samples were obtained at a single site by biopsy of a living donor kidney after removal from the donor and before implantation in the recipient.', 'protocol_core.protocol_id': 'kidney_collection_protocol', 'protocol_core.protocol_name': 'collection protocol for the kidney samples'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['collection_protocol', 'protocol']}, {'n': Node('collection_protocol', 'protocol', describedBy='https://schema.humancellatlas.org/type/protocol/biomaterial_collection/9.2.0/collection_protocol', id='urine_collection_protocol', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c36efdc3220c3bb1a1', uuid='8a374b38-368a-48bd-9867-144a618f8481', **{'method.ontology': 'EFO:0009123', 'method.ontology_label': 'urine collection', 'method.text': 'urine collection', 'protocol_core.protocol_description': 'Midstream urine samples were collected from patients with LN before kidney biopsy. The total urine volume (15–90\\u2009ml) was split into 2 50-ml Falcon tubes. Urine cells were pelleted by centrifugation at 200g for 10\\u2009min, and then resuspended in 1\\u2009ml cold X-VIVO10 medium (Lonza BE04-743Q). Cells were transferred to a microcentrifuge tube, washed once in 1\\u2009ml X-VIVO10 medium and then resuspended in 0.5\\u2009ml cold CryoStor CS10. Cells were transferred into a 1.8-ml cryovial, placed in a Mr Frosty freezing container, stored in at −80\\u2009°C overnight and then transferred to liquid nitrogen. For downstream analyses, cryopreserved urine cells were rapidly thawed by vigorous shaking in a 37\\u2009°C water bath, transferred into warm RPMI/10% FBS, centrifuged at 300g for 10\\u2009min and resuspended in cold HBSS/1% BSA.', 'protocol_core.protocol_id': 'urine_collection_protocol', 'protocol_core.protocol_name': 'collection protocol for the urine samples'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['collection_protocol', 'protocol']}, {'n': Node('dissociation_protocol', 'protocol', describedBy='https://schema.humancellatlas.org/type/protocol/biomaterial_collection/6.2.0/dissociation_protocol', id='kidney_enzymatic_dissociation', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1a2', uuid='1452fed3-bb3d-4660-8ab3-206ec1e4232b', **{'method.ontology': 'EFO:0009128', 'method.ontology_label': 'enzymatic dissociation', 'method.text': 'enzymatic dissociation', 'protocol_core.protocol_description': 'Kidney samples were thawed and processed in batches of four samples, with most batches containing both LN and control kidney samples. The cryovial containing the kidney tissue was rapidly warmed in a 37\\u2009°C water bath until almost thawed. The sample was then poured into a well of a 24-well dish and rinsed in a second well containing warmed RPMI/10% FBS. The tissue was incubated for 10\\u2009min at room temperature. Specimens were cut into 2–3 pieces and placed into a 1.5-ml centrifuge tube containing 445\\u2009μl Advanced DMEM/F-12 (ThermoFisher Scientific, catalog no. 12634-028) and 5\\u2009μl DNase I (Roche, catalog no. 04536282001, 100\\u2009U\\u2009ml−1 final concentration). Then, 50\\u2009μl Liberase TL (Roche, catalog no. 05401020001, 250\\u2009μg\\u2009ml−1 final concentration) was added, and the tube was placed on an orbital shaker (300–500\\u2009r.p.m.) at 37\\u2009°C for 12\\u2009min. At 6\\u2009min into the digestion, the mixture was gently pipetted up and down several times using a cut 1-ml pipette tip. After 12\\u2009min, 500\\u2009μl RPMI/10% FBS was added to stop the digestion. The cells were washed with RPMI/10% FBS, centrifuged at 300g at 4\\u2009°C for 10\\u2009min and resuspended in cold PBS for downstream analyses.', 'protocol_core.protocol_id': 'kidney_enzymatic_dissociation', 'protocol_core.protocol_name': 'kidney enzymatic dissociation'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'dissociation_protocol']}, {'n': Node('enrichment_protocol', 'protocol', describedBy='https://schema.humancellatlas.org/type/protocol/biomaterial_collection/3.1.0/enrichment_protocol', id='size_enrichment', maximum_size=70.0, schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1a3', uuid='73bf0a85-8cd9-470e-aac2-75feaa8d73cb', **{'method.ontology': 'EFO:0009337', 'method.ontology_label': 'cell size selection', 'method.text': 'cell size selection', 'protocol_core.protocol_description': 'Cells were washed once in HBSS/1% BSA, centrifuged and passed through a 70-μm filter.', 'protocol_core.protocol_id': 'size_enrichment'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'enrichment_protocol']}, {'n': Node('enrichment_protocol', 'protocol', describedBy='https://schema.humancellatlas.org/type/protocol/biomaterial_collection/3.1.0/enrichment_protocol', id='facs_leukocytes_enrichment', markers='CD10- CD31- CD45+', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1a4', uuid='499c8350-328a-4dc4-8403-e16375906be9', **{'method.ontology': 'EFO:0009108', 'method.ontology_label': 'fluorescence-activated cell sorting', 'method.text': 'fluorescence-activated cell sorting', 'protocol_core.protocol_description': 'For each sample, 10% of the sample was allocated to sort CD10+CD45− epithelial cells as single cells, and the remaining 90% of the sample was used to sort CD45+ leukocytes as single cells.', 'protocol_core.protocol_id': 'facs_leukocytes_enrichment'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'enrichment_protocol']}, {'n': Node('enrichment_protocol', 'protocol', describedBy='https://schema.humancellatlas.org/type/protocol/biomaterial_collection/3.1.0/enrichment_protocol', id='facs_epithelial_enrichment', markers='CD10+ CD31+ CD45-', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1a5', uuid='0d9985aa-95f2-46d8-8406-30ed4de15014', **{'method.ontology': 'EFO:0009108', 'method.ontology_label': 'fluorescence-activated cell sorting', 'method.text': 'fluorescence-activated cell sorting', 'protocol_core.protocol_description': 'For each sample, 10% of the sample was allocated to sort CD10+CD45− epithelial cells as single cells, and the remaining 90% of the sample was used to sort CD45+ leukocytes as single cells.', 'protocol_core.protocol_id': 'facs_epithelial_enrichment'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'enrichment_protocol']}, {'n': Node('library_preparation_protocol', 'protocol', describedBy='https://schema.humancellatlas.org/type/protocol/sequencing/6.3.1/library_preparation_protocol', end_bias='3 prime tag', id='10x_lib', nucleic_acid_source='single cell', primer='poly-dT', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1a6', strand='first', uuid='ee37e8c8-7845-4564-86ed-a4dd232801af', **{'cdna_library_amplification_method.ontology': 'OBI:0000415', 'cdna_library_amplification_method.ontology_label': 'polymerase chain reaction', 'cdna_library_amplification_method.text': 'polymerase chain reaction', 'cell_barcode.barcode_length': 16, 'cell_barcode.barcode_offset': 0, 'cell_barcode.barcode_read': 'Read 1', 'input_nucleic_acid_molecule.ontology': 'OBI:0000869', 'input_nucleic_acid_molecule.ontology_label': 'polyA RNA', 'input_nucleic_acid_molecule.text': 'polyA RNA', 'library_construction_kit.retail_name': \"10x Chromium 3' v2 Sequencing Kit\", 'library_construction_method.ontology': 'EFO:0009899', 'library_construction_method.ontology_label': \"10x 3' v2\", 'library_construction_method.text': \"10x 3' v2\", 'library_preamplification_method.ontology': 'OBI:0000415', 'library_preamplification_method.ontology_label': 'polymerase chain reaction', 'library_preamplification_method.text': 'polymerase chain reaction', 'protocol_core.protocol_description': 'Unsorted cells in 0.04% BSA (Sigma) were used to generate single-cell libraries with the Chromium Single Cell Gene Expression system using 3′ Library & Gel Bead Kit v2 (10X Genomics) and paired-end sequencing was performed on a HiSeq X.', 'protocol_core.protocol_id': '10x_lib', 'protocol_core.protocol_name': '10X Genomics Single Cell 3’ Reagent Kit v2', 'umi_barcode.barcode_length': 10, 'umi_barcode.barcode_offset': 16, 'umi_barcode.barcode_read': 'Read 1'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'library_preparation_protocol']}, {'n': Node('library_preparation_protocol', 'protocol', describedBy='https://schema.humancellatlas.org/type/protocol/sequencing/6.3.1/library_preparation_protocol', end_bias='3 prime tag', id='cel_seq2_lib', nucleic_acid_source='single cell', primer='poly-dT', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1a7', strand='first', uuid='0e9bd554-9198-4890-87c5-289355434ae5', **{'cdna_library_amplification_method.ontology': 'EFO:0009013', 'cdna_library_amplification_method.ontology_label': 'in vitro transcription', 'cdna_library_amplification_method.text': 'in vitro transcription', 'cell_barcode.barcode_length': 6, 'cell_barcode.barcode_offset': 6, 'cell_barcode.barcode_read': 'Read 1', 'input_nucleic_acid_molecule.ontology': 'OBI:0000869', 'input_nucleic_acid_molecule.ontology_label': 'polyA RNA', 'input_nucleic_acid_molecule.text': 'polyA RNA', 'library_construction_method.ontology': 'EFO:0010010', 'library_construction_method.ontology_label': 'CEL-seq2', 'library_construction_method.text': 'Cel-Seq2', 'library_preamplification_method.ontology': 'OBI:0000415', 'library_preamplification_method.ontology_label': 'polymerase chain reaction', 'library_preamplification_method.text': 'polymerase chain reaction', 'protocol_core.protocol_description': 'scRNA-seq was performed using the CEL-Seq2 method10\\xa0with the following modifications. Single cells were sorted into 384-well plates containing 0.6\\u2009µl 1% NP-40 buffer in each well. Then, 0.6\\u2009µl dNTPs (10\\u2009mM each; NEB) and 5\\u2009nl barcoded reverse transcription primer (1\\u2009µg\\u2009µl−1) were added to each well along with 20\\u2009nl ERCC spike-in (diluted 1:800,000). Reactions were incubated at 65\\u2009°C for 5\\u2009min, and then moved immediately to ice. Reverse transcription reaction and second-strand complementary DNA (cDNA) synthesis were carried out as previously described10, and double-stranded c-DNA was purified using 0.8× volumes of AMPure XP beads (Beckman Coulter). In vitro transcription reactions were performed as described followed by treatment with ExoSAP-IT PCR Product Cleanup Reagent (ThermoFisher Scientific, catalog no. 78201.1.ML). Amplified R"
"NA was fragmented at 80\\u2009°C for 3\\u2009min and purified using RNAClean XP beads (Beckman Coulter). The purified amplified RNA was converted to cDNA using an anchored random primer and Illumina adaptor sequences were added by PCR. The final cDNA library was purified using AMPure XP beads (Beckman Coulter). Paired-end sequencing of ~1 million paired-end reads per cell was performed on the HiSeq 2500 in Rapid Run Mode with a 5% PhiX spike-in using 15 bases for Read1, 6 bases for the Illumina index and 36 bases for Read2.', 'protocol_core.protocol_id': 'cel_seq2_lib', 'protocol_core.protocol_name': 'CEL-Seq2', 'umi_barcode.barcode_length': 6, 'umi_barcode.barcode_offset': 0, 'umi_barcode.barcode_read': 'Read 1'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'library_preparation_protocol']}, {'n': Node('protocol', 'sequencing_protocol', describedBy='https://schema.humancellatlas.org/type/protocol/sequencing/10.1.0/sequencing_protocol', id='10x_seq', paired_end=True, schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1a8', uuid='fd33880c-aba9-4adb-974c-8b06a336618a', **{'instrument_manufacturer_model.ontology': 'EFO:0008567', 'instrument_manufacturer_model.ontology_label': 'Illumina HiSeq X', 'instrument_manufacturer_model.text': 'Illumina HiSeq X', 'method.ontology': 'EFO:0008440', 'method.ontology_label': 'tag based single cell RNA sequencing', 'method.text': 'tag based single cell RNA sequencing', 'protocol_core.protocol_description': 'Unsorted cells in 0.04% BSA (Sigma) were used to generate single-cell libraries with the Chromium Single Cell Gene Expression system using 3′ Library & Gel Bead Kit v2 (10X Genomics) and paired-end sequencing was performed on a HiSeq X.', 'protocol_core.protocol_id': '10x_seq', 'protocol_core.protocol_name': \"10x 3' v2\"}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'sequencing_protocol']}, {'n': Node('protocol', 'sequencing_protocol', describedBy='https://schema.humancellatlas.org/type/protocol/sequencing/10.1.0/sequencing_protocol', id='cel_seq2_seq', paired_end=True, schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1a9', uuid='929e571a-9499-463b-9e1f-81caca7d9ae7', **{'instrument_manufacturer_model.ontology': 'EFO:0008565', 'instrument_manufacturer_model.ontology_label': 'Illumina HiSeq 2500', 'instrument_manufacturer_model.text': 'Illumina HiSeq 2500', 'method.ontology': 'EFO:0008441', 'method.ontology_label': 'full length single cell RNA sequencing', 'method.text': 'full length single cell RNA sequencing', 'protocol_core.protocol_description': 'Paired-end sequencing of ~1 million paired-end reads per cell was performed on the HiSeq 2500 in Rapid Run Mode with a 5% PhiX spike-in using 15 bases for Read1, 6 bases for the Illumina index and 36 bases for Read2.', 'protocol_core.protocol_id': 'cel_seq2_seq', 'protocol_core.protocol_name': 'CEL-Seq2'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'sequencing_protocol']}, {'n': Node('analysis_protocol', 'protocol', computational_method='10x', describedBy='https://schema.humancellatlas.org/type/protocol/analysis/10.0.0/analysis_protocol', id='10x_analysis', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1aa', uuid='aec60e7d-d4ab-4f50-b962-6c159992755f', **{'protocol_core.protocol_description': 'For cells processed using 10X, sequencing output was aligned using the 10X standard pipeline.', 'protocol_core.protocol_id': '10x_analysis', 'protocol_core.protocol_name': '10x standard pipeline', 'type.ontology': 'EFO:0030022', 'type.ontology_label': 'raw matrix generation', 'type.text': 'raw matrix generation'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'analysis_protocol']}, {'n': Node('analysis_protocol', 'protocol', computational_method='Drop-seq;STAR', describedBy='https://schema.humancellatlas.org/type/protocol/analysis/10.0.0/analysis_protocol', id='cel_seq2_analysis', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1ab', uuid='a5281f4a-65e3-492a-b160-444bad38ecd8', **{'protocol_core.protocol_description': 'For the cells processed using CEL-Seq2, we used a modified version of the Drop-seq pipeline developed by the McCarroll laboratory54\\xa0to perform all steps necessary to produce gene by cell expression matrices of reads as well as unique molecular identifiers (UMIs). These steps include demultiplexing, quality filtering, polyA and adapter trimming, aligning and collapsing reads with unique combinations of cell\\u2009+\\u2009gene\\u2009+\\u2009UMI. We used STAR-2.5.1b to align reads to the Hg19 human genome reference. Only uniquely mapped reads were counted. UMIs with fewer than ten reads were filtered out before creating the final expression matrices, to minimize read cross-contamination across cells. For each cell, the computed gene expression counts were then normalized for read depth and log-transformed.', 'protocol_core.protocol_id': 'cel_seq2_analysis', 'protocol_core.protocol_name': 'CEL-Seq2 analysis', 'type.ontology': 'EFO:0030022', 'type.ontology_label': 'raw matrix generation', 'type.text': 'raw matrix generation'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'analysis_protocol']}, {'n': Node('analysis_protocol', 'protocol', describedBy='https://schema.humancellatlas.org/type/protocol/analysis/10.0.0/analysis_protocol', id='ru10_analysis', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1ac', uuid='079dab30-b5e9-4054-840e-d67e75cce39f', **{'protocol_core.protocol_description': 'The _ru10_ files are filtered with Reads:UMI ≥ 10.', 'protocol_core.protocol_id': 'ru10_analysis', 'protocol_core.protocol_name': 'Filtering of the CEL-Seq2 raw data', 'type.ontology': 'EFO:0030023', 'type.ontology_label': 'processed matrix generation', 'type.text': 'processed matrix generation'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'analysis_protocol']}, {'n': Node('analysis_protocol', 'protocol', describedBy='https://schema.humancellatlas.org/type/protocol/analysis/10.0.0/analysis_protocol', id='qc_analysis', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1ad', uuid='fa40d311-38fc-4ba5-ab09-d13e1bf48bc8', **{'protocol_core.protocol_description': 'For kidney cells processed using CEL-Seq2, high-quality cells were defined as having at least 1,000 detected genes (that is, with positive count values); for urine cells, which tended to have fewer detectable genes, this threshold was set to 500 genes; for cells processed using 10X, the threshold used was 250 genes. We further required the percentage of reads mapped to mitochondrial genes per cell to be lower than 25% (8% for blood cells processed using 10X). To remove wells that were suspected to contain messenger RNA from multiple cells, we required the number of genes per cell to be smaller than 5,000 for the kidney cells processed using CEL-Seq2; 4,000 for urine cells; 1,700 for blood cells processed using 10X; and 3,500 for the kidney cells processed using 10X (all thresholds were set based on empirical distributions).', 'protocol_core.protocol_id': 'qc_analysis', 'protocol_core.protocol_name': 'QC of the CEL-Seq2 raw data', 'type.ontology': 'EFO:0030023', 'type.ontology_label': 'processed matrix generation', 'type.text': 'processed matrix generation'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'analysis_protocol']}, {'n': Node('analysis_protocol', 'protocol', computational_method='Seurat', describedBy='https://schema.humancellatlas.org/type/protocol/analysis/10.0.0/analysis_protocol', id='cluster_analysis', schema_type='protocol', self_link='http://172.20.59.73/protocols/641457c46efdc3220c3bb1ae', uuid='98f94650-7e2e-4e44-b7f3-4c45832c565d', **{'protocol_core.protocol_description': 'Clustering of kidney cells was done using Seurat (v.1.4.0.8), in a stepwise manner. We initially performed low-resolution clustering, analyzing all cells together, then labeled each of the resulting clusters as myeloid cells, T/NK cells, B cells, dividing cells or epithelial cells. The cells of each such general class were then analyzed separately, to identify finer clusters. In some cases, as described in the main text, the resulting clusters were further split into subclusters. In each case, clustering was done following principal component analysis, based on context-specific variable genes that were identified independently for each set of analyzed cells.\\n\\nSensitivity analysis was performed in each clustering step, with a particular focus on the low-resolution clustering stage. Briefly, all parameters in the clustering process, including the number of variable genes and principal components considered, were varied, and the robustness of the results was determined. To assess this robustness, we estimated in each case the Rand index: looking at a large number (1,000) of random pairs of cells, we counted how many pairs were either included in the same cluster in both of the compared clustering runs, or not included in the same cluster, and referred to these as consistent pairs; we then calculated the fraction of consistent pairs of all random cell pairs considered. We repeated this procedure 100 times, to calculate the mean of the Rand index estimate.', 'protocol_core.protocol_id': 'cluster_analysis', 'protocol_core.protocol_name': 'cluster analysis', 'type.ontology': 'EFO:0030024', 'type.ontology_label': 'analysis of matrices', 'type.text': 'analysis of matrices'}), '\"Entity does not have linkings with any other entity\"': 'Entity does not have linkings with any other entity', 'labels(n)': ['protocol', 'analysis_protocol']}]\n"

arschat commented 1 year ago

Previous submission was not deleted due to errors in the deletion, a workaround decided was to create a second submission. Graph validation errors had to do with this previous submission. All entities of the first submission that created graph invalid errors were deleted.

After that, submission was graph valid and I submitted the project and is now exported.

idazucchi commented 1 year ago

import form filled out!

ofanobilbao commented 1 year ago

Has been dropped from Release 28 due to validation issues on import

Wkt8 commented 1 year ago

Supplementary file metadata needs to be deleted from staging area and then send an import form stating this was dropped from the previous release.

[x] File metadata deleted
[x] import form sent

ofanobilbao commented 1 year ago

Apparently when importing, they found the same issues they found last time. So we might not have fixed the issue by the looks of it. @ESapenaVentura and @arschat to look into it

ofanobilbao commented 1 year ago

Passed validation on import side after Enrique edited it on Friday

ebi-ait / hca-ebi-wrangler-central