CDRH / datura

Datura is a ruby gem that manages data (TEI-XML, CSVs, VRA-XML, etc) and populates Solr / Elasticsearch instances. Datura also generates HTML for the formats to allow serving the contents via web
6 stars 5 forks source link

Better handling of empty TEI or CSV entries #192

Open karindalziel opened 2 years ago

karindalziel commented 2 years ago

sometimes a TEI element is blank, and sometimes there's an inadvertent space in a spreadsheet. When that happens we get a "No Label" entry on the orchid side

a better default would probably be to look for empty items (once all newlines, spaces, and tabs are stripped out) and then post.

We could use active support and the .blank? method, but I have a function to do it as well:

 def empty_post(field)
    unless field.to_s.strip.empty?
      field
    end
  end
wkdewey commented 2 years ago

This also came up with Habeas Corpus, some entries were coming up as "-".

wkdewey commented 2 years ago

I addressed this by changing the get_text and get_list methods (which use Nokogiri to query the XML) to return nil instead of blank when the xpath does not match. However, this still requires nil checks in the data repos and is a breaking change.