Open ESapenaVentura opened 1 year ago
Waiting on contributors to give us access to files / or to give us files via hca-util.
We finally got a spreadsheet with minimal information
Contradictory metadata in the spreadsheet that has been sent to us, will flag it and communicate to contributors.
Contacted bo About this data - Once he answers, I will proceed moving the data from terra to s3
Egress costs are paid by the bucket owner, they are ok with this --> enrique started the data transfer this morning
Old submission is in metadata valid - I am going to assume that the changes were made, but the submission could not be exported because exporting of DCP1 datasets was not well understood.
I am going to export it and then proceed to create the new one
Exported DCP1 update - Will need to delete
New submission gave some linking errors. @ESapenaVentura will look into it today. Bo asked for a timeline. If it's ready for secondary review today or tomorrow then we can give them next Release as date
There are 48 missing fastq files in the bucket - I am attaching the list
Sent an email to the contributor
Otherwise, the dataset is ready for secondary review!
@ESapenaVentura to message @Wkt8 when ready for review
Enrique to retrigger file validation and when that has occurred it will be ready for review.
re-triggered file validation, waiting for it to happen
This AUDR is amazing. Only two short things! Project tabs: Add yourself as a contributor?
Donor_Organism: CB9 doesn't have any metadata - is this correct?
Apart from that still waiting on the dataset to hit graph valid
Hi Wei! Thanks for the review
Add yourself as a contributor?
Good catch! I'll add myself :)
CB9 doesn't have any metadata - is this correct?
I did not receive any metadata for this donor, but I can ask the contributor. A lot of the CB donors (even from previous submission) lack metadata so maybe they just don't have it
@ESapenaVentura The file validation for this has finished. Submission 28ff3c1c-08e9-4e27-833f-04a521e24487
I've queued it for graph validation.
@MightyAx to review, the project is stuck in exporting
There is a problem exporting this submission as a spreadsheet. Investigating:
2022-12-05 11:55:44,419 - TerraSpreadsheetExporter - INFO - submission_uuid:28ff3c1c-08e9-4e27-833f-04a521e24487 - export_job_id:638a2f3731a4c47b19a7c103 - project_uuid:cc95ff89-2e68-4a08-a234-480eca21ce79 - Message received
2022-12-05 11:55:44,436 - TerraSpreadsheetExporter - INFO - submission_uuid:28ff3c1c-08e9-4e27-833f-04a521e24487 - export_job_id:638a2f3731a4c47b19a7c103 - project_uuid:cc95ff89-2e68-4a08-a234-480eca21ce79 - Received spreadsheet export message, informing ingest
2022-12-05 11:55:44,472 - TerraSpreadsheetExporter - INFO - submission_uuid:28ff3c1c-08e9-4e27-833f-04a521e24487 - export_job_id:638a2f3731a4c47b19a7c103 - project_uuid:cc95ff89-2e68-4a08-a234-480eca21ce79 - Generating Spreadsheet
2022-12-05 11:56:38,198 - TerraSpreadsheetExporter - ERROR - submission_uuid:28ff3c1c-08e9-4e27-833f-04a521e24487 - export_job_id:638a2f3731a4c47b19a7c103 - project_uuid:cc95ff89-2e68-4a08-a234-480eca21ce79 - Rejecting message: {\"exportJobId\":\"638a2f3731a4c47b19a7c103\",\"submissionUuid\":\"28ff3c1c-08e9-4e27-833f-04a521e24487\",\"projectUuid\":\"cc95ff89-2e68-4a08-a234-480eca21ce79\",\"callbackLink\":\"/exportJobs/638a2f3731a4c47b19a7c103\",\"context\":{}} due to error: '5d1c67c988fa640008aff7d0'
2022-12-05 11:56:38,198 - TerraSpreadsheetExporter - ERROR - submission_uuid:28ff3c1c-08e9-4e27-833f-04a521e24487 - export_job_id:638a2f3731a4c47b19a7c103 - project_uuid:cc95ff89-2e68-4a08-a234-480eca21ce79 - '5d1c67c988fa640008aff7d0'
Traceback (most recent call last):
File \"/app/exporter/queue/listener.py\", line 39, in try_handle_or_reject
self.handler.handle_message(json_body, msg)
File \"/app/exporter/terra/spreadsheet/handler.py\", line 36, in handle_message
self.exporter.export_spreadsheet(message.project_uuid, message.submission_uuid)
File \"/app/exporter/terra/spreadsheet/exporter.py\", line 26, in export_spreadsheet
workbook = self.downloader.get_workbook_from_submission(submission_uuid)
File \"/usr/local/lib/python3.10/site-packages/hca_ingest/downloader/workbook.py\", line 18, in get_workbook_from_submission
entity_dict = self.collector.collect_data_by_submission_uuid(submission_uuid)
File \"/usr/local/lib/python3.10/site-packages/hca_ingest/downloader/data_collector.py\", line 13, in collect_data_by_submission_uuid
entity_dict = self.__build_entity_dict(submission)
File \"/usr/local/lib/python3.10/site-packages/hca_ingest/downloader/data_collector.py\", line 23, in __build_entity_dict
self.__set_inputs(entity_dict, linking_map)
File \"/usr/local/lib/python3.10/site-packages/hca_ingest/downloader/data_collector.py\", line 75, in __set_inputs
input_biomaterials = [entity_dict[id] for id in input_biomaterial_ids]
File \"/usr/local/lib/python3.10/site-packages/hca_ingest/downloader/data_collector.py\", line 75, in <listcomp>
input_biomaterials = [entity_dict[id] for id in input_biomaterial_ids]
KeyError: '5d1c67c988fa640008aff7d0'
This submission is failing because it includes new 'specimen from organism' entities that are derived from a donor exported in a previous submission.
While the export to terra process now supports this, the actual spreadsheet generator code does not.
I recommend skipping spreadsheet export for this submission, and any submission that is for a "Multi-submission project"
Exported!
Thanks @MightyAx ! I have filled out the import form :)
Looks good but I'd want Enrique to double check as this is a contributor dataset without any publication info I can check with.
The analysis files are not showing up in the matrices tab because I forgot to include the file_source
This needs a quick update. Not super important so we can go on with the release as usual
Pending the dev task: export of project with two submission.
waiting for ebi-ait/dcp-ingest-central#928 to be in production before exporting again
Previous AUDR ticket, to track things that need attention with this dataset
Previous AUDR ticket
**Dataset/group this task is for:** project full name: Census of Immune Cells project short name: 1M Immune Cells project uuid: cc95ff89-2e68-4a08-a234-480eca21ce79 submission date: 2019-07-03T08:31:02.873Z submission uuid: 85e72912-9f91-4489-8169-3b43cc65a16a update date: 2019-07-03T09:13:08.660Z involved wranglers: Mallory,Ann,Freeberg; Danielle,,Welter; Analysis state: COMPLETE Project state: COMPLETE **Wrangler responsible for this dataset/lab:** Mallory **Description of the task:** - [x] review design: cord blood donor age 0 years - [x] update `postpartum` EFO term to appropriate HsapDv term - [x] update project short name to not include spaces - [ ] dissociation protocol states 10x v2?? - [x] review organ and organ_part ontologies in relation to all the data Not simple: - [x] Update old less specific 10x v2 sequencing ontology (EFO:0009310) to the newer more specific 10x 3'/5' v2 sequencing ontology (EFO:0009899/EFO:0009900). This is currently dependent on when pipeline change their subscription queries: https://github.com/HumanCellAtlas/secondary-analysis/issues/800 - [ ] Update file_format field from "fastq.gz" to "fastq". This is a file metadata update and is NOT a simple update. **Acceptance criteria for the task:** - [ ] spreadsheet updated in Google Drive - [ ] dataset AUDRed in prodProject short name:
Primary Wrangler:
@ESapenaVentura
Secondary Wrangler:
Associated files:
69357760-b367-4085-b5c3-44d3548b0ce6/
8ed6cfe7-1ff9-4d2f-9398-deb81ec15e7c/
Key Events
Please track the below as well as the key events: