coolbutuseless / yyjsonr

Fast JSON package for R
https://coolbutuseless.github.io/package/yyjsonr/index.html
Other
113 stars 7 forks source link

General jsonlite compatibility issues #3

Open coolbutuseless opened 10 months ago

coolbutuseless commented 10 months ago

{yyjsonr} has a "soft" goal to try and be compatible with {jsonlite} in the most common use cases.

Full compatibility with {jsonlite} is a non-goal.

This is a place to voice any compatibility issue you have, or express support/disagreement for things on the list below. This will help prioritise programming effort.

shikokuchuo commented 10 months ago

I think the above option should be the default actually. Stability of return types is important for programmatic use. Granted the type of vector cannot be guaranteed, but it seems it would be easier for code to handle this rather than having to deal with different ‘shapes’ (vector vs list).

coolbutuseless commented 10 months ago

@shikokuchuo Just to clarify: you think that (by default) the return type should always be an atomic vector?

I'm a bit conflicted on how this might go, as I like the idea of defaulting to easy-to-handle atomic vectors, but I'm "offended" by the idea of just discarding type information as soon as types are mixed.

shikokuchuo commented 10 months ago

Yes, making it always return an atomic vector seems to be better behaviour.

If you think about it from the perspective of wanting to do something with the returned data, having a list that retains the data types does not really give you any advantage. If you unlist (or coerce) then that also loses the type information. That means handling code would need to have 2 branches - to deal with the case it is an atomic vector or using lapply or similar when dealing with a list.

In reality I think a lot of json is retrieved from web APIs, in which case you probably know the expected return type, and can easily test in case the returned vector is not the expected type (perhaps the API returns a 'missing value' occasionally or something). This handling may be included within vectorised code. Otherwise we would have to write code that handles the fact that we may get a list from time to time (or flatten by default, which gets us back to the same place).

However, I don't have a real use case where this (may) happen (thankfully!), so this is purely theoretical and I defer to those who are actually dealing with such data.

coolbutuseless commented 10 months ago

Thanks, I appreciate your thoughts. Mapping JSON-to-R is tricky and there are so many tradeoffs!

shikokuchuo commented 10 months ago

Totally! That's why I'm so glad someone like you has actually attempted this :)

jonocarroll commented 10 months ago

(can be moved to a new issue if that helps)

Would it be useful to add a "json" class to the string output, as jsonlite does?

jsonlite <- jsonlite::toJSON(as.list(euro), auto_unbox = TRUE)
jsonlite
#> {"ATS":13.7603,"BEF":40.3399,"DEM":1.9558,"ESP":166.386,"FIM":5.9457,"FRF":6.5596,"IEP":0.7876,"ITL":1936.27,"LUF":40.3399,"NLG":2.2037,"PTE":200.482}
str(jsonlite)
#>  'json' chr "{\"ATS\":13.7603,\"BEF\":40.3399,\"DEM\":1.9558,\"ESP\":166.386,\"FIM\":5.9457,\"FRF\":6.5596,\"IEP\":0.7876,\""| __truncated__

yyjsonr <- yyjsonr::to_json_str(as.list(euro), opts = yyjsonr::to_opts(auto_unbox = TRUE))
yyjsonr
#> [1] "{\"ATS\":13.7603,\"BEF\":40.3399,\"DEM\":1.95583,\"ESP\":166.386,\"FIM\":5.94573,\"FRF\":6.55957,\"IEP\":0.787564,\"ITL\":1936.27,\"LUF\":40.3399,\"NLG\":2.20371,\"PTE\":200.482}"
str(yyjsonr)
#>  chr "{\"ATS\":13.7603,\"BEF\":40.3399,\"DEM\":1.95583,\"ESP\":166.386,\"FIM\":5.94573,\"FRF\":6.55957,\"IEP\":0.7875"| __truncated__

jsonlite:::print.json(yyjsonr)
#> {"ATS":13.7603,"BEF":40.3399,"DEM":1.95583,"ESP":166.386,"FIM":5.94573,"FRF":6.55957,"IEP":0.787564,"ITL":1936.27,"LUF":40.3399,"NLG":2.20371,"PTE":200.482}

class(yyjsonr) <- "json"
yyjsonr # since jsonlite is loaded, the print method is available, but this isn't necessarily the case
#> {"ATS":13.7603,"BEF":40.3399,"DEM":1.95583,"ESP":166.386,"FIM":5.94573,"FRF":6.55957,"IEP":0.787564,"ITL":1936.27,"LUF":40.3399,"NLG":2.20371,"PTE":200.482}
str(yyjsonr)
#>  'json' chr "{\"ATS\":13.7603,\"BEF\":40.3399,\"DEM\":1.95583,\"ESP\":166.386,\"FIM\":5.94573,\"FRF\":6.55957,\"IEP\":0.7875"| __truncated__

Created on 2023-09-06 with reprex v2.0.2

This would allow for yyjsonr to have its own print.json as well, but more importantly, I found this to be a requirement when using the JSON as input to a htmlwidget (not tested thoroughly, but that was the solution in at least one case I encountered).

shikokuchuo commented 10 months ago

This would allow for yyjsonr to have its own print.json as well, but more importantly, I found this to be a requirement when using the JSON as input to a htmlwidget (not tested thoroughly, but that was the solution in at least one case I encountered).

If this is necessary for some use cases, it could perhaps live in an option (non-default). If a key raison d'être of yysonr is performance, adding a class is probably the single worst thing you can do to kill it. This is as every generic function, including the standard subsetting operators, then needs to perform S3 dispatch. It is of course possible to use .subset and .subset2 but a lot of those in code is not fun and kills readability. Just my perspective of looking to push the boundaries of what's possible, not just oh well it's already 2x faster than 'jsonlite'.

jonocarroll commented 10 months ago

I'm not sure I follow - I'm suggesting adding a class to the resulting JSON string. If you're calling a generic on that then it will go through dispatch regardless, but will hit the default method if it was without a formal class.

If the suggestion was to make parsing a generic then yes, that would impact performance, but I'm not suggesting that at all.

shikokuchuo commented 10 months ago

I'm not sure I follow - I'm suggesting adding a class to the resulting JSON string.

Right - when I typed my response I was thinking the other way round (my main use cases). Assuming it's less likely you'd want to manipulate the resulting json string then I guess it's less of a concern.

If you're calling a generic on that then it will go through dispatch regardless, but will hit the default method if it was without a formal class.

Just to clarify what I meant, if the object is not classed it's OBJECT bit is not set on the SEXP and S3 dispatch is not performed. Example with the generic function [[ (yes it does nothing meaningful in this case):

cstr <- str <- yyjsonr::to_json_str(list(a = 1, b = 2))
str
#> [1] "{\"a\":[1.0],\"b\":[2.0]}"
class(cstr) <- "json"
cstr
#> [1] "{\"a\":[1.0],\"b\":[2.0]}"
#> attr(,"class")
#> [1] "json"

# generic `[[`:
microbenchmark::microbenchmark(str[[1L]], cstr[[1L]])
#> Unit: nanoseconds
#>        expr min    lq   mean median    uq  max neval
#>   str[[1L]]  77  89.5 137.26   97.0 104.0 3557   100
#>  cstr[[1L]] 302 314.0 410.14  327.5 347.5 5027   100

# non-generic .subset2:
microbenchmark::microbenchmark(.subset2(str, 1L), .subset2(cstr, 1L))
#> Unit: nanoseconds
#>                expr min lq   mean median uq  max neval
#>   .subset2(str, 1L)  76 80 127.40   84.5 89 3386   100
#>  .subset2(cstr, 1L)  85 90 105.08   97.0 99  812   100

Created on 2023-09-06 with reprex v2.0.2

msummersgill commented 10 months ago

Really cool work, looks like there is lots of potential for wide benefits if this could be leveraged in some commonly used packages!

It seems like if compatibility were designed such that yyjsonr functions could be dropped in place of prominent jsonlite implementations, then users could benefit by improved shiny app performance, faster document render times, etc.

Some examples that come to mind: shiny::toJSON(), shiny::safeFromJSON(), htmlwidgets::toJSON(), plotly::to_JSON(), plotly::from_JSON() , rmarkdown::base64_encode_object(), etc...

Is this something you have in mind, or are you thinking of this as more of a bulk data ingest focused library that might not share the same design objectives of jsonlite?

I did some poking around with (plotly::to_JSON() and was able to successfully render a plot, but with Warning messages: 1: In yyjsonr::to_json_str(x, auto_unbox = TRUE) : serialize_core(): Unhandled SEXP: closure.

coolbutuseless commented 10 months ago

@msummersgill I did think about aiming for strict jsonlite drop-in compatibility but this was not a priority for the initial release.

jangorecki commented 6 months ago

+1 against drop in replacement. If there is something to improve this is the place to do it. If jsonlite made some imperfect design decisions they are already stuck with them for good. New package like this is exactly where corrections can and should be made.

I would like to point out potential of I(), the AsIs class, that probably could be well utilized when deparsing to json needs to provide an alternative method for an object, and prevent from simplifying/wrapping up/boxing/unboxing/etc.