eddelbuettel / rcpptoml

Rcpp Bindings to C++ parser for TOML files
GNU General Public License v2.0
36 stars 9 forks source link

adds new logical 'escape' parameter for switching on/off escaping special … #27

Closed vh-d closed 6 years ago

vh-d commented 6 years ago

Should solve the GitHub issue #26 .

> parseTOML(input = "value='''\nHello\nWorld'''", fromFile = FALSE)
List of 1
 $ value: chr "Hello\\nWorld"

> parseTOML(input = "value='''\nHello\nWorld'''", fromFile = FALSE, escape = FALSE)
List of 1
 $ value: chr "Hello\nWorld"

Haven't written any c++ in ages so please double-check the c++ side of the changes.

codecov[bot] commented 6 years ago

Codecov Report

Merging #27 into master will increase coverage by 0.05%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #27      +/-   ##
==========================================
+ Coverage   60.51%   60.56%   +0.05%     
==========================================
  Files           3        3              
  Lines        1332     1334       +2     
==========================================
+ Hits          806      808       +2     
  Misses        526      526
Impacted Files Coverage Δ
src/parse.cpp 66.96% <100%> (+0.3%) :arrow_up:
R/parseToml.R 100% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 99e3755...a424d77. Read the comment docs.

eddelbuettel commented 6 years ago

That looks really promising -- I was busy yesterday and only glanced at but realized the same approach: new boolean toggle we need to pass through 'all the way'.

Only minor quip I have may be to possibly renamed the variable to escapeCharacters or something to fit a verbNoun pattern (we should also have used for useFile and useIncludize -- oh well.) Thoughts?

vh-d commented 6 years ago

Yes, naming is hard :-) May also be escapeSpecChars.

The thing is that maybe we should, for the sake of consistency, cover the other symbols with special treatment in TOML as well.

From TOML spec:

\b         - backspace       (U+0008)
\t         - tab             (U+0009)
\n         - linefeed        (U+000A)
\f         - form feed       (U+000C)
\r         - carriage return (U+000D)
\"         - quote           (U+0022)
\\         - backslash       (U+005C)
\uXXXX     - unicode         (U+XXXX)
\UXXXXXXXX - unicode         (U+XXXXXXXX)

"...All other escape sequences not listed above are reserved and, if used, TOML should produce an error."

eddelbuettel commented 6 years ago

Hm. So you mean expand this

// this function is borrowed with credits from cpptoml :)
std::string escapeString(const std::string& str) {
    std::string res;
    for (auto it = str.begin(); it != str.end(); ++it) {
        if (*it == '\\')
            res += "\\\\";
        else if (*it == '"')
            res += "\\\"";
        else if (*it == '\n')
            res += "\\n";
        else
            res += *it;
    }
    return res;
}

to cover other chars? Maybe. Less pressing though as ... they are even less frequently used?

vh-d commented 6 years ago

Yes, it is definitely not urgent.

There is one more strange thing concerning line breaks. From spec:

A newline immediately following the opening delimiter will be trimmed. All other content between the delimiters is interpreted as-is without modification.

but

> parseTOML(input = "value='''Hello\nWorld'''", fromFile = FALSE)
List of 1
 $ value: chr "HelloWorld"

So any first new line symbol is trimmed even if it is not the first char in the input string. But this seems to be attributed to cpptoml library. But that is out-of-topic as well, I will look at it later.

Edit: copy-paste mistake in the R code

vh-d commented 6 years ago

Correction, I should write:

> parseTOML(input = "value='''Hello\nWorld'''", fromFile = FALSE)
List of 1
 $ value: chr "HelloWorld"
eddelbuettel commented 6 years ago

Hm. Upstream does not seem to do anything special either. Or am I missing something?

edd@rob:~/git/cpptoml/examples(master)$ echo "value = '''Hello\nWorld'''" | ./parse_stdin_2018-10-22
{"value":{"type":"string","value":"Hello\\nWorld"}}
edd@rob:~/git/cpptoml/examples(master)$ echo "value = '''\nHello\nWorld'''" | ./parse_stdin_2018-10-22
{"value":{"type":"string","value":"\\nHello\\nWorld"}}
edd@rob:~/git/cpptoml/examples(master)$ 
vh-d commented 6 years ago

I am surprised as well. This "upstream" means some CLI tools based on cpptoml?

Well, that's interesting. I don't see any postprocessing of line breaks in RcppToml that could be blamed.

vh-d commented 6 years ago

I will open this as a new issue and will try to finish the #26 first, if you agree.

So, which one do you prefer? escapeSpecChars or escapeCharacters ?

eddelbuettel commented 6 years ago

I think we can still take this PR and I'll merge it now.

And yes, upstream aka cpptoml has a directory examples with examples/parse_stdin.cpp. I just compiled that into differerently named version as I had updated the upstream version twice (and one had an error now fixed).