eddelbuettel / rcpptoml

Rcpp Bindings to C++ parser for TOML files
GNU General Public License v2.0
36 stars 9 forks source link

Bad Parsing of Array of Tables #16

Closed gerrymanoim closed 7 years ago

gerrymanoim commented 7 years ago

Here's my toml file, taken straight from the spec page:

[[fruit]]
  name = "apple"

  [fruit.physical]
    color = "red"
    shape = "round"

  [[fruit.variety]]
    name = "red delicious"

  [[fruit.variety]]
    name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
    name = "plantain"

Using the following code:

library(RcppTOML)
cfg <- parseTOML("toml_test.toml")
str(cfg)

I get the following output:

Other: variety
Other: variety
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:List of 2
  .. .. ..$ color: chr "red"
  .. .. ..$ shape: chr "round"
  .. ..$         : chr "variety"
  ..$ :List of 2
  .. ..$ name: chr "banana"
  .. ..$     : chr "variety"
 - attr(*, "class")= chr [1:2] "toml" "list"
 - attr(*, "file")= chr "toml_test.toml"

It seems to me that in both situations "variety" doesn't actually seem to be parsed as a list, but instead parsed an an empty character?

cfg$fruit
[[1]]
[[1]]$name
[1] "apple"

[[1]]$physical
[[1]]$physical$color
[1] "red"

[[1]]$physical$shape
[1] "round"

[[1]][[3]]
[1] "variety"

[[2]]
[[2]]$name
[1] "banana"

[[2]][[2]]
[1] "variety"

Session Info:

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RcppTOML_0.1.2

loaded via a namespace (and not attached):
 [1] R6_2.2.0        magrittr_1.5    IRdisplay_0.4.4 pbdZMQ_0.2-4   
 [5] tools_3.3.2     Rcpp_0.12.10    crayon_1.3.2    uuid_0.1-2     
 [9] stringi_1.1.5   IRkernel_0.7.1  jsonlite_1.4    stringr_1.2.0  
[13] digest_0.6.12   repr_0.12.0     evaluate_0.10  
eddelbuettel commented 7 years ago

Confirming -- I see the same result. Also try setting verbose=TRUE to see this:

<default print method>
[[fruit]]
        name = "apple"
        [fruit.physical]
                color = "red"
                shape = "round"
        [[fruit.variety]]
                name = "red delicious"
        [[fruit.variety]]
                name = "granny smith"
[[fruit]]
        name = "banana"
        [[fruit.variety]]
                name = "plantain"
</default print method>
eddelbuettel commented 7 years ago

But look closely at the spec page. I suspect there is a typo -- [[fruit.physical]] seems more logical for the TOML file. If we do that we get

R> parseToml("/tmp/issue16.toml")
Other: physical
Other: variety
Other: variety
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name: chr "apple"
  .. ..$     : chr "physical"
  .. ..$     : chr "variety"
  ..$ :List of 2
  .. ..$ name: chr "banana"
  .. ..$     : chr "variety"
R> 
gerrymanoim commented 7 years ago

The problem seems to be having an array of tables inside of another table.

The following works fine:

[[fruit]]
  name = "apple"

[[fruit]]
  name = "banana"

This breaks:

[[fruit]]
  name = "apple"
  [[fruit.variety]]
    name = "red delicious"

As does this:

[fruit]
  name = "apple"
  [[fruit.variety]]
    name = "red delicious"

The produced JSON on the spec page makes my thing [fruit.physical] was not a typo, i.e. it shows the output as an object

"physical": {
        "color": "red",
        "shape": "round"
      },

rather than an array of objects.

eddelbuettel commented 7 years ago

I was not talking about the produced JSON; that is precisely how I surmised that the shown input is not what was meant. You need double [[ .. ]] not single.

gerrymanoim commented 7 years ago

Sorry - I'm not sure I follow? Though I think this is all separate from the issue here, let me see if I can articulate why I'm confused by what you're saying.

My understanding of that section of the TOML spec is that any [[table]] will produce a array named table with the elements being objects in that section.

For the fruit example, which seems to be demonstrating nested tables, I would expect an array named fruit, with fruit objects in it, the first one having a physical object and an array of objects named variety (i..e what you see in the JSON). I don't necessarily see why physical needs to be an array of tables. Sure it could be, but the output makes me think it is just a subtable of the first fruit table.

Apologies if there's something obvious I'm missing here.

eddelbuettel commented 7 years ago

After just committing one minor fix to quieten one print:

edd@max:~/git/rcpptoml(master)$ cat /tmp/issue16.toml 
[[fruit]]
  name = "apple"

  [[fruit.physical]]
      color = "red"
          shape = "round"

  [[fruit.variety]]
      name = "red delicious"

  [[fruit.variety]]
      name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
      name = "plantain"
edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name: chr "apple"
  .. ..$     : chr "physical"
  .. ..$     : chr "variety"
  ..$ :List of 2
  .. ..$ name: chr "banana"
  .. ..$     : chr "variety"
edd@max:~/git/rcpptoml(master)$ 

Note that I altered the TOML file.

eddelbuettel commented 7 years ago

But I hear you on the lack of proper nesting / recursion. That may or may not be easy. I'll take a look.

gerrymanoim commented 7 years ago

But fruit[[1]][[2]] and fruit[[1]][[3]] are still just chrs with value physical and variety. There's no way to access "granny smith"?

eddelbuettel commented 7 years ago

As I just said:

But I hear you on the lack of proper nesting / recursion.

gerrymanoim commented 7 years ago

Ah understood, thanks.

eddelbuettel commented 7 years ago

The recursion for the 'TableArray' was simply missing:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:Dotted pair list of 1
  .. .. ..$ :List of 2
  .. .. .. ..$ color: chr "red"
  .. .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"
edd@max:~/git/rcpptoml(master)$ 

Looks better, right?

eddelbuettel commented 7 years ago

And if I revert the input to just one [ ]:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:List of 2
  .. .. ..$ color: chr "red"
  .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"
edd@max:~/git/rcpptoml(master)$ 
gerrymanoim commented 7 years ago

Yep - looks perfect!

eddelbuettel commented 7 years ago

Now comitted. Thanks for catching that!

gerrymanoim commented 7 years ago

Just out of curiosity, why are the lists sometimes lists and sometimes pairlists? It doesn't seem to make any difference, I'm just curious in terms of the implementation as to why this happens.

In your example above it seems to sometimes come out one way, sometimes another way:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:Dotted pair list of 1
  .. .. ..$ :List of 2
  .. .. .. ..$ color: chr "red"
  .. .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"
eddelbuettel commented 7 years ago

It may just be a side effect of mixing the Rcpp types StretchyList and List....

If you want to dig deeper, could I suggest that you chase that yourself in the code? It's just a few hundred lines, and reasonably good fun to follow. Enabling verbose parse gives you pointers.

gerrymanoim commented 7 years ago

Thanks for the pointer, you were exactly right!

I don't know if this is the "right" fix (my C++ knowledge is extremely limited) but making the following two changes in parse.cpp changed everything to be lists:

diff --git a/src/parse.cpp b/src/parse.cpp
index dd4228d..b7729f4 100644
--- a/src/parse.cpp
+++ b/src/parse.cpp
@@ -252,7 +252,7 @@ SEXP getTable(const std::shared_ptr<cpptoml::table>& t, bool verbose=false) {
                 l.push_back (getTable(ta, verbose));
                 ++ait;
             }
-            sl.push_back(Rcpp::Named(p.first) = l);
+            sl.push_back(Rcpp::Named(p.first) = Rcpp::as<Rcpp::List>(l));
         } else {
             if (verbose) Rcpp::Rcout << "Other: " << p.first << std::endl;
             sl.push_back(p.first);
@@ -300,7 +300,7 @@ Rcpp::List tomlparseImpl(const std::string input, bool verbose=false, bool fromf
                 l.push_back (getTable(ta, verbose));
                 ++ait;
             }
-            sl.push_back(Rcpp::Named(p.first) = l);
+            sl.push_back(Rcpp::Named(p.first) = Rcpp::as<Rcpp::List>(l));

         } else if (p.second->is_table()) {
             auto ga = std::dynamic_pointer_cast<cpptoml::table>(p.second);
(END)

Appreciate all the help and the package work. Cheers!

eddelbuettel commented 7 years ago

Thanks for working the details out. That is nice and straightforward and I committed this now as https://github.com/eddelbuettel/rcpptoml/commit/7aae502041ab45837c527037700fe88c946175fb