Closed JosiahParry closed 1 year ago
Update: i realized that JSON pointers have their own syntax which seems exceptionally limited. So I don't think it's possible.
Yes we are not doing any magic -- we are simply asking the rather magical simdjson library to parse for us, and it does its thing relative to the JSON spec. So ... ok to close?
@JosiahParry
I can improve the performance by not parsing the geometry field in each feature. Is there documentation on the type of syntax that should be used?
You definitively can do this in C++ using our main API.
...
More generally, if there was some kind of syntax/query language... where you can load a JSON document selectively, that would be great... but it may be harder to design that it sounds.
Right, but we don't currently expose that. So, as the saying goes, "patches welcome".
@lemire if I could even fake my way around C++ I would try, but I can't :) Regardless this is blazingly fast and memory efficient so i dont feel too bad just throwing out the geometry after parsing it
# A tibble: 4 × 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list>
1 jsonlite::fromJSON(res) 2.15s 2.15s 0.464 48.8MB 1.39 1 3 2.15s <NULL> <Rprofmem>
2 rjson::fromJSON(res) 1.17s 1.17s 0.852 54.9MB 0 1 0 1.17s <NULL> <Rprofmem>
3 jsonify::from_json(res) 8.23s 8.23s 0.122 25.8MB 0.243 1 2 8.23s <NULL> <Rprofmem>
4 RcppSimdJson::fparse(res) 73.34ms 76.71ms 11.7 14MB 1.95 6 1 511.56ms <NULL> <Rprofmem>
(PSA for @lemire: That is output from a somewhat "special" benchmarking package which opines that mixing different units in the same column is a good idea.)
First off, I want to say how come I haven't heard of this package earlier? Insane speed improvements over any other json parsing library I've encountered. One thing I am particularly interested in is the query parameter. The use case is I have a geojson file that I want to extract everything but the geometry. I'm not able to understand how I can use the
query
parameter so that I can improve the performance by not parsing thegeometry
field in each feature. Is there documentation on the type of syntax that should be used?