eddelbuettel / rcppsimdjson

Rcpp Bindings for the 'simdjson' Header Library
116 stars 13 forks source link

feature/deserialize #17

Closed knapply closed 4 years ago

knapply commented 4 years ago

moving work from fork to base

eddelbuettel commented 4 years ago

@dcooley Any chance you can look at it too?

dcooley commented 4 years ago

yep - I'll have a look in a few hours

dcooley commented 4 years ago

yeah so that code is some sort of wizardry...

I've compared the outputs against some simple & edge-case structures which caused a bit of hassle as we built jsonify::from_json(). One example is how from_json('{}') is handled in a round-trip (mentioned here).

jsonify::to_json( NULL )
# {} 
jsonify::from_json( '{}' )
# NULL
RcppSimdJson:::.deserialize_json( '{}' )
# named list()
jsonlite::fromJSON('{}')
# named list()

And similarly you're returing [] as NA_LOGICAL?

jsonify::to_json( list() )
# [] 
jsonify::from_json( '[]' )
# list()
RcppSimdJson:::.deserialize_json( '[]' )
# logical(0)
jsonlite::fromJSON('[]')
# list()

And for what it's worth, I found the comment where I jotted down my from_json() design choices.

May I ask what your reasons are for these two examples?

knapply commented 4 years ago

yeah so that code is some sort of wizardry...

I'd like to simplify naming conventions and add some documentation. The flow of data is tricky and I keep coming back to diagramming it as the simplest way to explain. I'll have to put some more thought into how to make it simpler to digest, debug, maintain, enhance, etc.

May I ask what your reasons are for these two examples?

Of course!

The quick takeaway: it shouldn't be difficult to set how both {} and [] are handled via R-level options. It might be as simple as having a parameter (empty=?) allowing the user to pass anything they want. I've never tried this so I need to check, but I think passing a user-provided SEXP down to here (and the equivalent in the object simplifier) and replacing Rcpp::LogicalVector(0) with it will do the trick:

https://github.com/knapply/rcppsimdjson/blob/80a9022746fd77d9c71214474c2b5d3779e644ce/inst/include/RcppSimdJson/deserialize/simplify.hpp#L171

template <Type_Policy type_policy, utils::Int64_R_Type int64_opt, Simplify_To simplify_to>
inline auto dispatch_simplify_array(const simdjson::dom::array& array) -> SEXP {
  if (std::size(array) == 0) {
    return Rcpp::LogicalVector(0);
  }
...
}



I'll see if I can sort that out, but assuming that's the case, we should come to a consensus on what the defaults should be.

Edit: moved reply to #18

eddelbuettel commented 4 years ago

This is really good stuff and my head is slightly spinning (as I mostly avoided dealing with "complicated" JSON so far).

dcooley commented 4 years ago

yeah so that code is some sort of wizardry...

I hope you took this as a compliment! :)

knapply commented 4 years ago

I hope you took this as a compliment! :)

I did! 😃

I attempted to simplify the code a tad and swapped out some hand-rolled nonsense for std::optional.

Best of all: you can now pass arbitrary R objects in to use for empty JSON arrays and objects.

RcppSimdJson:::.deserialize_json(
  "[]", empty_array = logical()
)
#> logical(0)

RcppSimdJson:::.deserialize_json(
  "[]", empty_array = list()
)
#> list()

RcppSimdJson:::.deserialize_json(
  "{}", empty_object = structure(list(),.Names = character())
)
#> named list()

RcppSimdJson:::.deserialize_json(
  "{}", empty_object = NULL
)
#> NULL

RcppSimdJson:::.deserialize_json(
  "[[],{}]", 
  empty_array = head(mtcars),
  empty_object = igraph::make_full_graph(4)
)
#> [[1]]
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
#> 
#> [[2]]
#> IGRAPH c6acf0b U--- 4 6 -- Full graph
#> + attr: name (g/c), loops (g/l)
#> + edges from c6acf0b:
#> [1] 1--2 1--3 1--4 2--3 2--4 3--4
dcooley commented 4 years ago

That's amazing!

eddelbuettel commented 4 years ago

Should we merge it into master?

dcooley commented 4 years ago

fine by me

knapply commented 4 years ago

We can merge, but I'll have some refinements this evening (including syncing w/ upstream and rechecking everything).

I'll pass the updates to feature/deserialize in a separate PR.

eddelbuettel commented 4 years ago

Really up to you, just wanted to check in. We can PR now, or if you want to stick some more in we can wait.