Closed vh-d closed 6 years ago
What does the TOML spec say?
You may have a point. When I use the upstream code as a standalone command-line parser, I get:
edd@rob:~/git/cpptoml/examples(master)$ ./parse_stdin_2018-10-22 < /tmp/multiline.toml
{"value":{"type":"string","value":"Hellow\nworld!\n"}}
edd@rob:~/git/cpptoml/examples(master)$
I'll look into it. I already added another missing element on the commute in this morning: plain 'time' input (ie HH:MM:SS) for R has no native support will now be returned (rather than dropped) as a string.
To me, it looks like escaping special characters was supposed only for printing verbose output (for debugging) in printValue() but it somehow made its way to getValue() as well.
This easy change solves the issue for me. But it is a breaking change of course.
Yes, it is likely something like that. I have to think if there are other cases of escaped chars we would need to protect. And that protect might be better done at the R accessor ...
Can you take a peek at the TOML spec if it says something about special chars and escapes?
It seems it is all over cpptoml as well from where I probably took it.
Could you possibly deal with it at the R level?
> RcppTOML::parseTOML("/tmp/twolines.toml")
List of 1
$ value: chr "hello\\nworld!\\n"
R>
R> txt <- RcppTOML::parseTOML("/tmp/twolines.toml")
R>
R> cat( gsub('\\\\n', '\n', txt[[1]]) )
hello
world!
R>
Sure that's what I was doing until I stopped wondering why would anyone prefer the current way :-)
I was worried that I would have to handle various cases besides "\n" -> "\n". And when I failed with generic gsub('\\\\', '\\', value)
which obviously cannot work I came here :-)
But it seems there are only 3 cases ('\n', '"' and '\') so its relatively small issue.
Still it may be worth it to have some RcppToml::parseTOML2() without the escaping.
Doing it on the R level basically means climbing the whole list tree and applying gsub()
on any character value. Maybe I will come back if I come up with some elegant solution.
I think a better solution may be to pass an option down to the parse function which, if set, will skip the escaping.
exactly... and it would be set to FALSE by default for backward compatibility
Yup. I see you forked already. Care to try a small and focused pull request? ;-)
Fixed in #28
I think the issue still exists. I'm using version 0.1.5 and multiline strings will still break it.
e.g.
query = """ SELECT * FROM
TABLE
WHERE
property IS NOT NULL;
after loading
toml <- parseTOML(file.path(PATH,"config.toml"))
toml$query: SELECT *FROM \\n (...)
This will break the SQL parser and my dirty solution is dbGetQuery(con, gsub("\\\\n", "", toml$query))
Am I holding it wrongly?
I am not sure I read the TOML spec and examples the same way you do -- in short, I do not think that looks like valid TOML. Can you point to a TOML example or different TOML parser doing this?
You need to explicitly set the parameter escape=FALSE. So in your example
toml <- parseTOML(file.path(PATH,"config.toml"), escape=FALSE)
The default behaviour escape=TRUE was kept due to backward compatibility.
I just checked again and I forget to add an important line in the top example
[PT]
query = """ SELECT * FROM
TABLE
WHERE
property IS NOT NULL;
"""
This results in
> toml <- parseTOML(file.path("/tmp","config.toml"), escape=F)
> toml
List of 1
$ PT:List of 1
..$ query: chr " SELECT * FROM TABLE\\nWHERE \\nproperty IS NOT NULL;\\n"
If I now delete the section header
query = """ SELECT * FROM
TABLE
WHERE
property IS NOT NULL;
"""
it results in
> toml <- parseTOML(file.path("/tmp","config_no_section.toml"), escape=F)
> toml
List of 1
$ query: chr " SELECT * FROM TABLE\nWHERE \nproperty IS NOT NULL;\n"
The toml parser of python eats both versions, but I've never checked if it is correct behaviour.
OK, that's strange. Are you up to date with master? I believe CRAN version may be outdated.
I updated to master and now it's fine. I didn't check in which state CRAN was because the merge event was 4 days older than the CRAN version. I assumed that CRAN was fine ^^
Could we get git tags that resemble the version in the description file?
Thanks for all your fast replies
CRAN has versions.
Github has a commit stream. Master may at times be ahead of CRAN.
You have a money-back guarantee, but not much more. Remember we're volunteers here.
Sry, wasn't meant to be that harsh ^^ It's weird to be at the other end of the issue/request stream ;)
CRAN has versions.
Github has a commit stream. Master may at times be ahead of CRAN.
You have a money-back guarantee, but not much more. Remember we're volunteers here.
I don't wanna overstrain your volunteering here, but I think it would be really cool to update the CRAN version because it doesn't include @vh-d 's second PR #32 which fixes the escape
switch in case of TOML tables and arrays.
Yes, looking at my git
there where #32 and the lesser #35 (doc only). Looks like cpptoml
has not changed so it should be quick. Just sent something else to CRAN so the timing may be right to do a quick release here too.
I would have a small PR fixing unicode chars in arrays. I will try to send it ASAP so it would make it to this new CRAN version
Thanks everybody for the help, particularly @vh-d for the different PRs, @heavywatal for catching an issue with a stale statement in the README.md and @salim-b for the reminder.
I put some work into switching to the (awesome, trust me) and still new-ish tinytest package to have proper tests, also of the installed package (!!) which is really important (and under-appreciated) feature (that testthat
lacks in the default settings).
So now with the version in the master
branch:
As per f22c0aa I renamed it 0.1.5.1 as a release candidate. If you have a moment, give it a test and let me know (here) it is ok. It should be fine. I also added something to ChangeLog and NEWS to reflect the PRs. Thanks again for those.
Nice. All tests pass on my Linux machine. There is one issue with encoding on Windows (which you have already spotted as well as I see in later commits). See my PR for a possible solution.
Yes -- the machines at rhub and win-builder are good for such tests.
And thanks for the PR which I just merged.
Awesome! All tests performed by tinytest::test_package("RcppTOML")
run fine here, too (Ubuntu 16.04; R 3.6.0).
Yes it should as I develop on Ubuntu (albeit "current", now 19.04) and CRAN etc test on it too. Thanks for your patience, last few months were busy but a release was clearly a good idea.
And it does of course make my day that you already groked that you too can now run tests locally! Paging @markvanderloo who is the magician behind the curtain for that one...
I am trying to understand why parsing of TOML files with multi-line strings such as this
yields escaped special characters:
Can't we get this?
Am I missing something? Thanks