UCSCLibrary / dams_project_mgmt

DAMS purpose is to provide access to digitized and born-digital UCSC Special Collections content. This repository is used for project planning. It holds the task tickets and roadmap for the different projects under DAMS.
2 stars 0 forks source link

[5] DateCreated field accepts multiple formats #498

Closed rschwab closed 2 years ago

rschwab commented 2 years ago

Summary As an admin user, I want to be able to ingest works with different formats for dateCreatedIngest (formerly DateCreated in BulkOps). Currently it only accepts YYYY-MM-DD but we also have some works with only YYYY. This format should be preserved on exports as well, so that any updates to the metadata doesn't lose the correct dates.

Acceptance Criteria

Tech details / discussion Formats are: dateCreatedIngest: YYYY or YYYY-MM-DD dateCreatedDisplay: ex: approximately YYYY, or YYYY-YYYY dateCreated: YYYY-MM-DD (ex: 1950-12-31)

Values for dates are imported using dateCreatedIngest and dateCreatedDisplay. Bulkrax converts the value in dateCreatedIngest to a YYYY-MM-DD format to store in dateCreated. It converts partial dates so that when sorted they go at the end of the list (YYYY-12-31 or equivalent). dateCreatedIngest and dateCreatedDisplay are exportable, and are never overwritten by the value in dateCreated field.

bkiahstroud commented 2 years ago

Is it possible that users will be entering values like "circa YYYY" as opposed to just "YYYY"? If so, we could add a third solr field to store the raw value that gets imported. From there, we match on "YYYY", put that value in dateDisplay, then put "YYYY-12-31" in dateCreated

rschwab commented 2 years ago

@rmjaffe See above - do we need to accept words in the date field, ie "circa YYYY"?

rmjaffe commented 2 years ago

At present dateCreatedDisplay was created and is used to accommodate and display textual or non-conforming date values like, "circa 2009", YYYY-YYYY date ranges , or "~2019." Being able to display these values, like "approximately 2019-2021" is imperative to our users and stakeholders.

Those same values are recorded in dateCreated in a way that conforms to the the YYYY or YYYY-MM-DD formating rule: "2009", "YYYY|YYYY|YYYY", "2019".

It does nor serve to repurpose or redefine these properties, especially not dateCreatedDisplay. If we want to introduce a third parallel property it should be for the YYYY-12-31 values: dateCreatedSomethingElse. Or if YYYY-12-31 absolutely must go in dateCreated, add a third property for the YYYY or YYYY-MM-DD ingest date: dateCreatedIngest.

rschwab commented 2 years ago

Sorry, I didn't realize we were using the date display field in this way. Lets create a third to store the ingest/export value, since dateCreated does need to be formatted in a precise way for sorting purposes.

bkiahstroud commented 2 years ago

@rmjaffe @rschwab Should the column header using still be "dateCreated", and the value parsing will be done behind the scenes? Or do we want to import these values into the "dateCreatedDisplay" or the third property's column?

rmjaffe commented 2 years ago

I don't know if this answers your question -- I know it doesn't answer it directly, but I think the data should look like:

dateCreatedIngest (or whatever we want to call it): YYYY dateCreatedDisplay: approximately YYYY dateCreated: YYYY-12-31

Is that right, @rschwab? My question is would I need to supply the dateCreated, i.e. YYYY-12-31 for our YYYY only dates on the ingest spreadsheets or is this fully behind the scenes?

rschwab commented 2 years ago

My take on it is:

Spreadsheets should have two values: one that is just a date in a limited set of formats (for now either YYYY, or YYYY-MM-DD), and one that is open-ended for displaying to end users ("approx YYYY", etc).

Behind the scenes (meaning public users never see it, staff working with spreadsheets never see it) a date field is created based on the date-only field in the spreadsheet.

The labels of these I don't have opinions on. However it works best for Rachel and team is what we should go with.

Rachel: Do you have preferences on the labelling/column header names?

bkiahstroud commented 2 years ago

It's worth keeping exporting / round-tripping in mind. For example, we probably shouldn't label the column header meant to take "approx YYYY" as "dateCreated", because on export, the "dateCreated" header will have the "YYYY-MM-DD" value

rmjaffe commented 2 years ago

If we're introducing a new, third property that will be used in place of dateCrated (as we are currently) on the ingest spreadsheets, I'm OK with that property/column header being dateCreatedIngest so that it's clear what that property is for and what values go in it. dateCreatedIngest, dateCreatedDisplay and dateCreated. The only one of those that should display to endusers is dateCreatedDisplay (or if there is no dateCreatedDisplay, dateCreatedIngest), will likely be labeled 'Date Created'. Jess is finalizing her decision making on the data dictionary this afternoon.

bkiahstroud commented 2 years ago

@rmjaffe do you want me to add dateCreatedIngest to the data dictionary so Jess can add to it today? Or should we hold off until the work for this ticket is in review?

rmjaffe commented 2 years ago

Thanks for asking -- let's update it after Jess is done. Whatever public-facing label she gives for dateCreatedDisplay will be the same for dateCreatedIngest. Given that dateCreated is being made a system-use only property, does it require a predicate or can I reassign that dcterms predicate to dateCreatedIngest?

bkiahstroud commented 2 years ago

It still requires a predicate, but if you would like to use a different one so you can reassign the current one to dateCreatedIngest that should be fine

rschwab commented 2 years ago

I updated the description and acceptance criteria based on this discussion. Please let me know if I missed anything or got something wrong.

rmjaffe commented 2 years ago

@bkiahstroud @rschwab Just added dateCreatedIngest to the data dictionary and updated information on all three of the dateCreated properties.