CUB-Libraries-CTA / counter-data-loader

Loads COUNTER database from JR1 report spreadsheets
1 stars 2 forks source link

CSV-based import breaks when newlines are in titles (2) #53

Closed bonnland closed 1 year ago

bonnland commented 1 year ago

Problem: Newlines can be found in excel content.

Acceptance Criteria:

Replace newline characters from titles with white space characters. Trim resulting title.

More details:

Description:

The CSV file format is inherently line-based, i.e. a newline always indicates the end of one record.

When newlines are part of the "title" field, this breaks the mysqlimport command by splitting one record over two lines.

It's unclear how to best handle this case, but it can be difficult to find the rows with these problems in a large Excel spreadsheet.

Image

Image

bonnland commented 1 year ago

Code commit on branch "filter-newlines":

https://github.com/culibraries/counter-data-loader/commit/873dcbe0cfbf707d7aaae7b3363b452393c67097