elifesciences / elife-poa-xml-generation

tools for creating XML for submission into elife's publish on accept workflow
MIT License
1 stars 3 forks source link

Clean csv #334

Closed gnott closed 6 years ago

gnott commented 6 years ago

A variant of CSV data observed recently where the data was spread across multiple lines with indents for each line in the XML-like output, shown here in the new datasets.csv test fixture.

The concept for the fix was to clean the CSV data first, write more standard CSV data files, which can then be parsed by csvreader. To be most effective, It should have no side effect when it reads and writes normal CSV data.

The main feature here is the clean_csv() function and the test fixtures to test it. It is the result of a couple revisions and although it looks a little basic the tests are passing. Other changes are some tweaks and refactors.

The CSV data is back to normal, so it is not a rush to deploy this fix now. If we merge it in we should hopefully be protected if the multi-line data appears again.