OregonDigital / OD2

Next generation of Oregon Digital ( https://oregondigital.org ) digital collections platform, built on Samvera Hyrax ( https://github.com/samvera/hyrax/ )
18 stars 1 forks source link

Export CSVs have some fields always in header even if no data in export #2888

Closed wickr closed 11 months ago

wickr commented 11 months ago

Descriptive summary

@sarahfish07 noticed our CSV exports contain some fields/columns even if they have no relevant data in the export:

first_line first_line_chorus full_text (may need to be added to bulkrax reserved fields) gps_latitude gps_longitude identification_verification_status military_highest_rank object_orientation original_filename photograph_orientation resolution specimen_type

The fields that are marked in the crosswalk.yml as multiple: false then don't get the split value in the field mapping. Then the CSV processor includes those fields whether there is data or not. Field mapping sample from scarc-test exporter (id: 14)

first_line {"from"=>["http://opaquenamespace.org/ns/sheetmusic_firstLine"]}
first_line_chorus {"from"=>["http://opaquenamespace.org/ns/sheetmusic_firstLineChorus"]}
folder_name {"from"=>["http://opaquenamespace.org/ns/folderName"], "split"=>"\\|"}
folder_number {"from"=>["http://opaquenamespace.org/ns/folderNumber"], "split"=>"\\|"}
form_of_work {"from"=>["http://rdaregistry.info/Elements/w/P10004"], "split"=>"\\|"}
former_owner {"from"=>["http://id.loc.gov/vocabulary/relators/fmo"], "split"=>"\\|"}
full_size_download_allowed {"from"=>["http://opaquenamespace.org/ns/fullSizeDownloadAllowed"]}
full_text {"from"=>["http://opaquenamespace.org/ns/fullText"]}
genus {"from"=>["http://rs.tdwg.org/dwc/terms/genus"], "split"=>"\\|"}
gps_latitude {"from"=>["http://www.w3.org/2003/12/exif/ns#gpsLatitude"]}
gps_longitude {"from"=>["http://www.w3.org/2003/12/exif/ns#gpsLongitude"]}
has_finding_aid {"from"=>["http://lod.xdams.org/reload/oad/has_findingAid"], "split"=>"\\|"}

Expected behavior

Export CSVs only contain headers and columns when data is present in the export.

Related work

Link to related tickets or prior related work here.

Accessibility Concerns

Add any information here to indicate any known or suspected accessibility issues for this ticket

wickr commented 11 months ago

QA Steps:

  1. Create a new Exporter, CSV format
  2. Find existing content, can limit to 3 or 5 works
  3. Inspect result CSV file, confirm fields/columns above aren't included any more
sarahfish07 commented 11 months ago

Exported cw-furlong collection (28 works). Blank fields are still included in the export.

wickr commented 11 months ago

QA pass. (initial QA was on prod site only, not staging)

Created 2 exporters on staging: https://staging.oregondigital.org/exporters/14?locale=en https://staging.oregondigital.org/exporters/15?locale=en

Both CSVs had only columns that contained data. folder_number appeared correctly in one exported CSV but not the other.