UCSCLibrary / dams_project_mgmt

DAMS purpose is to provide access to digitized and born-digital UCSC Special Collections content. This repository is used for project planning. It holds the task tickets and roadmap for the different projects under DAMS.
2 stars 0 forks source link

Exported column headers match imported column headers #539

Closed bkiahstroud closed 1 year ago

bkiahstroud commented 2 years ago

Edit: We decided to go with snake_cased_headers for export format.

Summary

A couple examples (not exhaustive) -- imported headers (left) vs. exported headers (right):

Important note

These differences do not affect the success of round trips. I.e. Bulkrax knows that workType and model refer to the same thing. The changes necessary to fulfill this tickets requirements are purely cosmetic

bkiahstroud commented 2 years ago

@rschwab upon researching, the differences in capitalization (e.g. titleAlternative vs. titlealternative) actually comes from Ruby's CSV library. Specifically, a setting called HeaderConverters.

When Bulkrax reads the CSV file, it sets the header converters setting to :symbol, which does the following:

:symbol Leading/trailing spaces are dropped, string is downcased, remaining spaces are replaced with underscores, non-word characters are dropped, and finally to_sym() is called. - Source

The only other option is :downcase, which would lead to the same outcome. Since the :symbol header converters setting gives us a lot of useful sanitation, I don't see a good way to export titleAlternative instead of titlealternative.

An alternative (no pun intended) option would be to use snake_cased headers instead of camelCased headers. This would make sure the headers stay easily readable without needing to rely on specific capitalization. Example:

# field_mapping
'titleAlternative' => { from: ['title_alternative', 'titlealternative'] }
#       ^                              ^                     ^
#       |                              |                     |
#       |                              |    Initial import header (downcased version of "titleAlternative")
# Property name                   Exported header

Explanation: Regarding round-tripping, Bulkrax understands everything in the :from array from the field mappings. The first element in the :from array is used as the export header

rschwab commented 2 years ago

@rmjaffe see comment above.

Summary: It appears exporting with the original column header casing isn't possible due to a limitation in the underlying CSV implementation, they'll always get downcased. One workaround would be to use underscores in between words, so titleAlternative would become title_alternative.

rmjaffe commented 2 years ago

+1 for using underscores. Where would that change need to be reflected apart from external documentation like the data dictionary and bulkrax documentation (namely the examples)? Would the yaml file need to be updated?

bkiahstroud commented 2 years ago

I think the only place that needs to be touched code-wise would be the Bulkrax initializer (the yaml file does not need to be updated).

Is that what you were asking @rmjaffe?

rmjaffe commented 2 years ago

Yes, thanks for clarifying.

rschwab commented 2 years ago

Right, this change would involve changes to the code as Kiah identified, plus updates to the documentation and all of our import spreadsheets.

bkiahstroud commented 2 years ago

and all of our import spreadsheets

@rschwab that wouldn't be strictly necessary; Bulkrax will still be able to import camelcased columns (e.g. titleAlternative) just fine. If you want to update them all so they're 100% consistent for round-tripping, that's totally fine, but it isn't necessary.