esmero / ami

Archipelago Multi Importer. A module of mass ingest made for the masses
GNU Affero General Public License v3.0
2 stars 4 forks source link

Solr Importer to CSV HTML encodes double quotes instead of escaping them #36

Open DiegoPino opened 3 years ago

DiegoPino commented 3 years ago

What?

We get this:

The Art Exemplar, "English Typography and Book-Work," columns 117-118, blank verso

WE should get this:

The Art Exemplar, "English Typography and Book-Work," columns 117-118, blank verso

and this in source CSV should be

"The Art Exemplar\, \"English Typography and Book-Work\,\" columns 117-118\, blank verso"

Same with & and other HTML entity-able chars

Where?

here:

https://github.com/esmero/ami/blob/ISSUE-14/src/Plugin/ImporterAdapter/SolrImporter.php#L1115

\Drupal::service('ami.utility')->csv_append()

Maybe go for htmlspecialchars($value, ENT_COMPAT, 'UTF-8', FALSE); addslashes($value);

DiegoPino commented 3 years ago

@aksm @alliomeria I removed a lot of code. Documentation says the fputcsv() the way I write the cSV is already doing enough for escaping. Maybe I overreacted before and added too much logic. So I removed this from all the CSV writing logic (includes also the other importers). https://github.com/esmero/ami/pull/31/commits/fd52fb6899c510cb7c4632deb1f35e1046d7e559 You can do a git pull on ISSUE-14 to get the changes and test tomorrow? We sadly need some good testing, like weird data and maybe a Google Sheet with Japanese Characters mixed with " and ' and $& in between?

The particular collection you shared now comes perfectly out, but who know?

Good night!