datacite / bolognese

Ruby gem and command-line utility for conversion of DOI metadata
MIT License
40 stars 14 forks source link

Issues-1920 Keep newlines in the descriptions and convert the ascii co… #163

Closed ashwinisukale closed 10 months ago

ashwinisukale commented 11 months ago

Purpose

While reading the XML file, new lines in the descriptions were removed due to the squish method

Root case:

We are using the https://github.com/datacite/bolognese/blob/010a854e642fefdcdf17d04dffa2b4de8721808c/lib/bolognese/utils.rb#L1060 squish method to remove new lines, extra internal spaces and any ASCII or unicodes form the string. eg. " foo bar \n \t \u2003Hello\u2009 boo".squish # => "foo bar Hello boo"

closes: https://github.com/datacite/datacite/issues/1920

Approach

Removed the squish method as we want to keep the spaces or new lines as it is in the description. Also I tried to test this change through test case with different types of special character, unicodes inside the string.

Open Questions and Pre-Merge TODOs

Learning

Types of changes

Reviewer, please remember our guidelines:

ashwinisukale commented 11 months ago

Thanks @digitaldogsbody for the review, actually there is some discussion going on whether we need to keep new lines for all the tag in the XML or just for the description. Kelly is on leave, once she is back I will discuss with her, hence I have not modified other failing test cases as there is no clarity whether we need this change everywhere or not.