RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
146 stars 61 forks source link

Wierd behaviour on backslashes in literals #165

Open mvanbrab opened 2 years ago

mvanbrab commented 2 years ago

Found on version 5.0.0 and 4.15.0:

Single backslashes are eaten, double backslashes are output as double backslashes, triple backslashes are output as double backslahes.

A testcase is provided in attached file.

Remark: version 4.12.0 always output twice the number of backslashes in the input.

issue.zip

DylanVanAssche commented 2 years ago

Hi @mvanbrab !

Do you get the same behavior if the data is in JSON?

mvanbrab commented 2 years ago

Well no, but in equivalent JSON input I have to provide two backslashes where I mean one, so this is the input then and the output is OK (also contains the same amount of backslashes as the input), but that seems like a no-brainer to me...


[
  {
    "id": "1",
    "description": "One backslash:     \\."
  },
  {
    "id": "2",
    "description": "Two backslashes:   \\\\."
  },
  {
    "id": "3",
    "description": "Three backslashes: \\\\\\"
  },
  {
    "id": "4",
    "description": "A backslash before the '{': a ∈ ℝ⁺₀\\{1}."
  }
]```
DylanVanAssche commented 2 years ago

Aha! That confirms my hypothesis: a few versions ago, we switched from Apache CSV to Open CSV to parse CSV files. That library is probably eating the \ characters for lunch.

DylanVanAssche commented 2 years ago

It seems that this is a common problem: https://dzone.com/articles/properly-handling-backslashes-using-opencsv

DylanVanAssche commented 2 years ago

Tested the solution from dzone.com. It works for this case but breaks other edge cases.