Closed bkiahstroud closed 1 year ago
@rschwab upon researching, the differences in capitalization (e.g. titleAlternative
vs. titlealternative
) actually comes from Ruby's CSV library. Specifically, a setting called HeaderConverters.
When Bulkrax reads the CSV file, it sets the header converters setting to :symbol
, which does the following:
:symbol Leading/trailing spaces are dropped, string is downcased, remaining spaces are replaced with underscores, non-word characters are dropped, and finally to_sym() is called. - Source
The only other option is :downcase
, which would lead to the same outcome. Since the :symbol
header converters setting gives us a lot of useful sanitation, I don't see a good way to export titleAlternative
instead of titlealternative
.
An alternative (no pun intended) option would be to use snake_cased headers instead of camelCased headers. This would make sure the headers stay easily readable without needing to rely on specific capitalization. Example:
# field_mapping
'titleAlternative' => { from: ['title_alternative', 'titlealternative'] }
# ^ ^ ^
# | | |
# | | Initial import header (downcased version of "titleAlternative")
# Property name Exported header
Explanation: Regarding round-tripping, Bulkrax understands everything in the :from
array from the field mappings. The first element in the :from
array is used as the export header
@rmjaffe see comment above.
Summary: It appears exporting with the original column header casing isn't possible due to a limitation in the underlying CSV implementation, they'll always get downcased. One workaround would be to use underscores in between words, so titleAlternative would become title_alternative.
+1 for using underscores. Where would that change need to be reflected apart from external documentation like the data dictionary and bulkrax documentation (namely the examples)? Would the yaml file need to be updated?
I think the only place that needs to be touched code-wise would be the Bulkrax initializer (the yaml file does not need to be updated).
Is that what you were asking @rmjaffe?
Yes, thanks for clarifying.
Right, this change would involve changes to the code as Kiah identified, plus updates to the documentation and all of our import spreadsheets.
and all of our import spreadsheets
@rschwab that wouldn't be strictly necessary; Bulkrax will still be able to import camelcased columns (e.g. titleAlternative
) just fine. If you want to update them all so they're 100% consistent for round-tripping, that's totally fine, but it isn't necessary.
Edit: We decided to go with snake_cased_headers for export format.
Summary
A couple examples (not exhaustive) -- imported headers (left) vs. exported headers (right):
workType
-->model
titleAlternative
-->titlealternative
Important note
These differences do not affect the success of round trips. I.e. Bulkrax knows that
workType
andmodel
refer to the same thing. The changes necessary to fulfill this tickets requirements are purely cosmetic