HumanCellAtlas / ingest-central

Ingest Central is the hub repository for the ingest service
Apache License 2.0
0 stars 1 forks source link

Apply the suggested style for Metadata spreadsheet #211

Open malloryfreeberg opened 6 years ago

malloryfreeberg commented 6 years ago

What problem does the suggested enhancement solve? Please describe.

The current version of the template metadata spreadsheet is not user-friendly.

What type of enhancement is this?

Performance, usability

How will this enhancement benefit wrangers and/or end-users?

Data contributors will have better direction when filling in a metadata spreadsheet.

How complex to implement do you estimate this enhancement will be? (High, Medium, Low)

Low

How much benefit do you estimate this enhancement will provide? (High, Medium, Low)

Medium

How urgent is this and is there a specific date it needs to be done by? (High, Medium, Low)

Medium (not needed for the beta, but needed by end of calendar year)

Describe your preferred solution

Update the template spreadsheet generator script to include features/formats supplied by @gabsie . Example of new format is attached in HCA_metadata_template.xlsx

Describe alternatives you've considered

None

Additional context

Example new formatting: HCA_metadata_template.xlsx

justincc commented 6 years ago

Please clarify.

malloryfreeberg commented 6 years ago

@justincc issue filled out. @gabsie can provide more specific information on formatting when needed.

justincc commented 6 years ago

Where does this fit into GA tasks? (@hewgreen, @morrisonnorman)

gabsie commented 6 years ago

I think we discussed this to be part of the second data recruitment call. Shall we discuss this tomorrow at the meeting? @morrisonnorman @justincc

justincc commented 6 years ago

Sure can do. First data recruitment is the one that got us 14 datasets and 2nd is going to start imminently?

malloryfreeberg commented 6 years ago

First data recruitment is the one that got us 14 datasets and 2nd is going to start imminently?

Yes

justincc commented 5 years ago

In discussion with Tony today this may be regarded as not actually critical for GA. Please let's discuss further if you disagree.

lauraclarke commented 5 years ago

Again, data recruitment is starting in the new year. Being able to send out improved spreadsheets has significant impact on our contributors and wranglers ability to contribute and wrangler data.

A slick process to generate the improved styled spreadsheets isn't needed

A process to generate the improved styled spreadsheets is

gabsie commented 5 years ago

I think this was already implemented by Dani, + Simon - and it was really quick to do this. It might be in another ticket, but I think it was done and now I need to review it, when they send me the result.

malloryfreeberg commented 5 years ago

I just ran the spreadsheet template generator from master branch in ingest-client, and @gabsie suggested changes are not incorporated. I don't see an open PR or ~branch~ (maybe it's local_spreadsheet_builder?) for them, either.

Although I agree that updating this is not a hard requirement for GA, I am strongly advocating for getting this done as soon as possible given we already de-prioritized it for the cbeta. If we are going to continue to improve the data contribution process, we need to have some feedback on the new spreadsheet style as the previous style was found to be confusing and needing much improvement.

As of today, there are 2 groups that need spreadsheets ASAP, with more on the way as we hear back from contributors after the holiday break.

I believe it will not take a lot of developer time to incorporate Gab's changes. We need things like changing the order of spreadsheet rows, adding guidelines (similarly to how examples are already added), improved coloring, etc. I'm happy to take a stab at the updates, given Dani it out until Monday and I don't know when Simon is back.

justincc commented 5 years ago

Talking with @malloryfreeberg, it sounds like this could be chiefly or even solely a metadata team task rather the rest of ingest dev. Mallory is going to bring this up with @simonjupp and @daniwelter in sprint planning next Tues 8th Jan.

malloryfreeberg commented 5 years ago

Remaining important issues:

  1. Ontologized fields don't use the field description, but rather the description of the .text or .ontology or .ontology_label fields in the ontology schema. I think we want the actual field values here.
  2. Ontologized fields don't use the field user-friendly terms, but rather use the programmatic name.
  3. Ontologized fields always appear required because the .text field is required if the field is used. Required field annotation should depend on whether the field itself is required, not based on the .text field being required.
  4. "Biomaterial ID" fields in each Biomaterial tab should have the "Biomaterial" text replaced with the submittable type (e.g. "Cell suspension ID")
  5. Same as above but with protocols

Seems most of the major features still needed are ontology handling-related...

daniwelter commented 5 years ago

Points 1-3 are now done. Points 4 + 5 require patch schema changes.

simonjupp commented 5 years ago

I think point 4 and 5 are going to be needed for the metadata TSV too to unblock the beta-2 phase. DataBiosphere/azul#784

Currently being tracked by HumanCellAtlas/metadata-schema#899

malloryfreeberg commented 5 years ago

@simonjupp If you can tell me what the change needs to be, we can probably get those patch updates in dev quickly.