Closed gabsie closed 2 years ago
Have a separate design discussion to identify the tasks under this user value ticket.
@gabsie @ESapenaVentura anything else?
hi, as per our meeting today I have edited this ticket, in the description above. an important update - we're not considering the wrangler use case (I want to update a project with this spreadsheet), but just the data consumer case.
@aaclan-ebi and @gabsie to have a meeting about this today
Notes from meeting with @aaclan-ebi and @ESapenaVentura:
Make the downloadable spreadsheet contain the linking between the biomaterials, and the processes, so that requirement 2 above could be fulfilled (include the columns which allow to see the connection, from the default metadata template)
We can implement this requirement, and remove the extraneous process tab with the following implementation:
It may not be mentioned as a requirement above but changing links via upload is explicitly out of scope for this user story.
Make the format of this spreadsheet contain row 1-5 from the original metadata template, which has got the user-friendly name, the description, examples, system name and separator line
These details can be pulled from the metadata and inserted with the correct font and formatting but should be a separate ticket.
Try keep the ordering of columns as per original spreadsheet Try keep the ordering of spreadsheet tabs as per original spreadsheet
There might be a misundserstanding of how our excel files work. There is no one format that works, (@ESapenaVentura mentioned he routinely moves the columns around to make more sense to him) Columns and tabs can be added, moved and removed to some extent without any issues. Storing the column/tab ordering against every import and reusing this ordering upon download is a much bigger task than it may seem and is certainly out of scope for this ticket. What might be more achievable is to define a common ordering that will better serve our users and use this same ordering when generating blank files as well as re-generating files from existing projects.
Either way, these changes should be tracked on another ticket.
Make the downloadable metadata available as an end point
We're a little uncertain what is required here as we believe this is already available.
Make the downloadable metadata available for download in the catalogue?
@aaclan-ebi is worried that this might get us in trouble with the DCP.
From conversation with @gabsie
The focus of a data consumer is only to provide columns that are populated, not blank columns.
This will not be taken in this sprint. It will probably be high priority next sprint
From conversation with @amnonkhen, regarding the Downloading Links functionality detailed above.
The spreadsheet is currently generated using granular API calls per entity/link etc. which takes time and is much more difficult to implement, because we join entities ourselves with the entire overhead it entails (dev time, quality, execution time) Instead the work should be done in ingest-core ( a new endpoint is a no brainer), and maybe even using more advanced mongodb queries to extract the links.
FAO @aaclan-ebi:
@MightyAx and I will pair today to start working on this.
@MightyAx and I brainstormed on the possible options how to implement this: https://miro.com/app/board/o9J_li2IEts=/
The current plan is:
@MightyAx @aaclan-ebi hopefully to work today on this.
We've successfully prototyped a submission "census", which is just the id's and relationship mappings of all objects: https://github.com/ebi-ait/ingest-core/pull/97
WIP changes in the importer : https://github.com/ebi-ait/ingest-client/pull/32
Please note ebi-ait/dcp-ingest-central#491 may block testing of this feature.
In PR review.
Might need to improve retrieving of linking information from core in order to download large spreadsheets faster.
Hi @ami-day , the changes should already be in staging. Please verify.
It would be nice to upload a real spreadsheet from a dataset in prod to staging, do some updates in linking via ingest UI (by expanding a process row and making some changes) and download the spreadsheet to see if the spreadsheet shows the correct linking.
ticket on wrangling test. to be tested by @ami-day
Hi @aaclan-ebi , I was able to download a real spreadsheet from ingest prod. and it looks how it was initially, however, I am getting an error trying to re-upload it to staging: https://api.ingest.staging.archive.data.humancellatlas.org/submissionEnvelopes/61a0d0f54fe10b74b9ae5a27/submissionErrors
Maybe we could discuss on Monday when you're back.
@aaclan-ebi to look into this today.
PR with the fixes is ready
@ami-day to review this today
I will test this today
Hi @aaclan-ebi , sorry for the delay in testing. But it works :) here is the submission: https://staging.contribute.data.humancellatlas.org/submissions/detail?uuid=01c167fa-0cb3-44bc-a392-1e7fa8d156ca All of the specimens were enriched by FACS and size selection. As a test I deleted the size selection enrichment protocol for specimen with ID SKN8090540 and output cell suspension ERX5053663. I then downloaded the spreadsheet, and I can see the size selection protocol is missing from that cell suspension only. Let me know if you need anymore testing for this ticket
Thanks, @ami-day. Alegria, @aaclan-ebi - can we now put this on prod, and be able to demo this tomorrow at DCP demo? Thank you!
@aaclan-ebi is monitoring deployment to production
Deployed to prod today.
https://gitlab.ebi.ac.uk/hca/ingest-broker/-/pipelines/218497
Note: this ticket has been changed to just include the task about the linking between biomaterials. The original description and epic is here.
As a data consumer, I want to be able to access and download the metadata spreadsheet for a project with as complete metadata as possible, with the ability to trace which file corresponds to which cell suspension, specimen and donor.
Note: for the data consumer version we are not including the empty columns from the default metadata spreadsheet, but only the ones which have values associated with them.
Acceptance Criteria / Definition of Done