Open SymbolixAU opened 5 years ago
on branch issue58
(which inculdes to_ndjson
)
@dcooley I just installed this per your comment in #29 but something is erroring out quickly.
Pretty minimal reprex:
# remotes::install_github("SymbolixAU/jsonify", "issue58")
jsonify::from_ndjson('{"abc":123}')
fails with error:
Error in rcpp_from_ndjson(json, simplify, fill_na) : object 'json' not found
I think this one's pretty straightforward. There are some switches in argument names between ndjson
and json
at various points. This example seems to be erroring here inside ndjson_to_r.character
, at the return line: return(rcpp_from_ndjson( json, simplify, fill_na ))
. This affects raw strings and filepaths, and connections wrapped in file("pathtofoo.ndjson")
.
Edit: just forked to make this tweak and ran into more issues. Specifically, calling from_ndjson
with json
switched to ndjson
in that return statement:
'{"abc":123}'
casereadLines(foo, 2)
) with the error belowTo reproduce from my fork,
jsonify::from_ndjson(ndjson = '{"abc":123}')
jsonify::from_ndjson(paste0(c('{"abc":123}', '{"def":456}', collapse = "\n")))
with results:
> from_ndjson(paste0(c('{"abc":123}', '{"def":456}', collapse = "\n")))
Error in rcpp_from_ndjson(ndjson, simplify, fill_na) :
Expecting a single string value: [type=character; extent=3].
In addition: Warning messages:
1: In if (is_url(ndjson)) { :
the condition has length > 1 and only the first element will be used
2: In if (file.exists(ndjson)) { :
Error in rcpp_from_ndjson(ndjson, simplify, fill_na) :
Expecting a single string value: [type=character; extent=3].
I've also messed up my branches so I'm now going through and fixing a load of branch conflicts.
However, your paste0
statement is creating a vector of strings
paste0(c('{"abc":123}', '{"def":456}', collapse = "\n"))
[1] "{\"abc\":123}" "{\"def\":456}" "\n"
Is this what you intended?
It looks like you may have the closing bracket in the wrong place?
paste0(c('{"abc":123}', '{"def":456}'), collapse = "\n")
[1] "{\"abc\":123}\n{\"def\":456}"
Yikes, not my finest moment on Github. You're correct, closing paren in the wrong spot, so this statement now works with the fixed argname json to ndjson described above:
jsonify::from_ndjson(paste0(c('{"abc":123}', '{"def":456}'), collapse = "\n"))
On larger test files that I've written with to_ndjson in the past few days, I'm still seeing errors in from_ndjson, but I can't reproduce these easily on testcases at the moment. The crux of the issue is that this works fine:
lapply(readLines("somefile.ndjson"),
function(x) from_ndjson(ndjson = x))
but this fails:
from_ndjson("somefile.ndjson")
with Error in rcpp_read_ndjson_file(normalizePath(ndjson), get_download_mode(), : json parse error
I can't see why each individual line succeeds but parsing the entire file fails. When I try to reproduce the error using a simple case,
test_list <- from_ndjson(paste0(c('{"abc":123}', '{"def":456}'), collapse = "\n"))
writeLines(to_ndjson(test_list), "testlist.ndjson")
from_ndjson("testlist.ndjson")
this also works fine.
So I need to figure out the structural difference between these files to reproduce. Will write back when I do. (This time without dumb errors!)
Can you check how the ndjson is separated, is it by \n
, or perhaps \r\n
?
edit: clarified this and not a problem with 58, working as intended
Ha, I can finally reproduce. This was nasty. Still don't have a good idea why it's happening.
When I wrote the files throwing json parse error
calling from_ndjson
, I was running SymbolixAU/jsonify@issue29
. For that branch, writeLines(to_ndjson(df), "file.ndjson")
with the default sep="\n"
causes a trailing empty line in "file.ndjson" that causes the parse error, but every individual line was valid ndjson. I re-wrote the exact same data today after installing SymbolixAU/jsonify@issue58_fix
-- still using writeLines(..., sep="\n")
and the same code -- and found that the second presumed-identical file didn't cause a parse error.
The two files -- 1 written with to_ndjson
yesterday on issue29
, and 2 today on issue58_fix
-- pass an all.equal() test once they're in R, but they have different MD5 hashes and readLines() on the two files comes up with different vector lengths (off by 1). The only difference is the empty trailing line for the file written from jsonify@issue29
.
I discovered this on a production file, which is a deeply nested dataframe with list-cols. I don't see why the deeply-nested part would cause this, but until I can get a more minimal reprex together I just included an anonymized variant of the data.
(Github won't let me upload .rds directly, so I guess unzip this first.) testcase.rds.zip
remotes::install_github("SymbolixAU/jsonify", "issue29", force = TRUE)
testcase <- readRDS("testcase.rds")
writeLines(jsonify::to_ndjson(testcase), "issue29.ndjson")
remotes::install_github("SymbolixAU/jsonify", "issue58_fix", force = TRUE)
testcase <- readRDS("testcase.rds")
writeLines(jsonify::to_ndjson(testcase), "issue58.ndjson")
openssl::md5(file("issue29.ndjson"))
openssl::md5(file("issue58.ndjson"))
length(readLines("issue29.ndjson"))
length(readLines("issue58.ndjson"))
library(jsonify)
foo <- from_ndjson("issue58.ndjson")
bar <- from_ndjson("issue29.ndjson")
bar <- from_ndjson(paste0(readLines("issue58.ndjson", nrow(foo)), collapse = "\n"))
all.equal(foo, bar)
Seriously thought I was hallucinating halfway through this.
from_ndjson
on this list objet returns the elements one-level deeper. This probably isn't right.
lst <- list(
x = 1:5
, y = list(
a = letters[1:5]
, b = data.frame(i = 10:15, j = 20:25)
)
)
from_ndjson( "{\"x\":[1,2,3,4,5]}\n{\"y\":{\"a\":[\"a\",\"b\",\"c\",\"d\",\"e\"],\"b\":[{\"i\":10,\"j\":20},{\"i\":11,\"j\":21},{\"i\":12,\"j\":22},{\"i\":13,\"j\":23},{\"i\":14,\"j\":24},{\"i\":15,\"j\":25}]}}" )
^^ Actually, this is probably correct, because we are splitting the list up during the to_ndjson()
phase, and each 'row' of JSON has to become a valid object. Therefore, to re-combine it we have to turn the whole lot into an array.
In the future, if anyone specifically asks or this is an issue, we may have to come up with rules for handling
{"x":[1,2,3,4,5]}
{"y":{"a":["a","b","c","d","e"],"b":[{"i":10,"j":20},{"i":12,"j":22}]}}
where we would need to remove the trailing }\n
from the previous row, and the leading "{"
from the next row
Going to CRAN because I need to update the package to set stringsAsFactors = FALSE
by default ( in my tests ) to comply with R 4.0.0, but keeping open to test the one remaining check box.
&Document
rapidjson::Dom
rather than streaming the string with<< '[' << ndjson << ']'
, thenstd::replace( json.begin(), json.end(), '\n', ',')
update reverse imports (mapdeck, geojsonsf, spatialwidget) to useR_xlen_t
values before release to CRAN.