eddelbuettel / rcppsimdjson

Rcpp Bindings for the 'simdjson' Header Library
115 stars 13 forks source link

Update to simdjson 0.6.0 #52

Closed eddelbuettel closed 3 years ago

eddelbuettel commented 3 years ago

This updates to the just released 0.6.0 of simdjson. I only needed to flip four test predicates to now expect error.

@knapply Give a quick lookover; we can probably release as is to get this to our users and can then ponder in parallel if there is a need (and time !!) for additional work from our end. I have been musing if we should maybe think about adding simple accessors for utf-validation.

/cc @lemire @jkeiser with big thanks as always

jkeiser commented 3 years ago

Not needed for this obviously but I'd love see a sample using the new (preview) on demand api. It seems like the r binding is the only binding that could let users use it., because of the wondrous c++Ness.

eddelbuettel commented 3 years ago

Good plan. We can hopefully add something. Is there an example in simdjson I should be aware of? Anything else that would help bootstrap our efforts a little?

lemire commented 3 years ago

@eddelbuettel

One caveat: we do not have runtime dispatching working yet for On Demand, so the code must be compiled to the machine (e.g., -march=native).

    auto json = "{\"coordinates\":[{\"x\":1.1,\"y\":2.2,\"z\":3.3}, {\"x\":3.1,\"y\":2.2,\"z\":4.3}, {\"x\":14.1,\"y\":2.1,\"z\":3.3}]}"_padded;
    ondemand::parser parser{};
    auto doc = parser.iterate(json);
    double x{0};
    double y{0};
    double z{0};
    for (ondemand::object point_object : doc["coordinates"]) {
      x += double(point_object["x"]);
      y += double(point_object["y"]);
      z += double(point_object["z"]);
    }

We'll be happy to provide more examples, but I think you can get the idea from this above.

Note that there are still rough edges to the On Demand API. We expect that it is fully correct (save maybe for corner cases), but there are things to watch for, like you cannot iterate over two objects at the same time... https://github.com/simdjson/simdjson/blob/master/doc/ondemand.md

Despite all my warnings, we really want people to try it out. We really think it is stuff from the future.

jkeiser commented 3 years ago

Yep! https://github.com/simdjson/simdjson/blob/master/doc/ondemand.md

It's essentially the DOM api, but it parses the json progressively while you iterate through it (forward only).

jkeiser commented 3 years ago

To be clear, you can iterate two nested objects at the same time as long as you finish with inner one before continuing the outer (like nested for loops). What @lemire means is you can't keep references to two sibling objects at the same time. In fact, right now you need to ensure that any object or array variable is destroyed / goes out of scope before you continue iterating a parent array or look up another field in a parent object.

Basically make sure you scope object and array variables the same way they appear in the json file.

jkeiser commented 3 years ago

Note that you should be able to use "auto" Instead of ondemand::object, in that example.

jkeiser commented 3 years ago

Oh. Wait. No, you can't.

lemire commented 3 years ago

@jkeiser My example is copied-pasted from our tests. It should work.

(The rough edges become smooth after one reads the short documentation, but I have been told that programmers in 2020 don't read documentation or books.)

eddelbuettel commented 3 years ago

Good idea regarding coverage -- I just added a simple commit doing just that via run.sh as I do in a number of other repos too.

knapply commented 3 years ago

Good idea regarding coverage -- I just added a simple commit doing just that via run.sh as I do in a number of other repos too.

Welp, here I am digging through Rcpp to match it. Perfect!

eddelbuettel commented 3 years ago

Yes, sorry, Rcpp uses Docker so different setup. I at first grep'ed wrong too, then looked into run.sh and indeed:

edd@rob:~/git/rcppsimdjson(feature/simdjson_0.6.0)$ grep coverage ~/git/*/.travis.yml
/home/edd/git/anytime/.travis.yml:  - ./run.sh coverage
/home/edd/git/asioheaders/.travis.yml:#  - ./run.sh coverage
/home/edd/git/bh/.travis.yml:#  - ./run.sh coverage
/home/edd/git/dang-azure/.travis.yml:#  - ./run.sh coverage
/home/edd/git/dang.orig/.travis.yml:#  - ./run.sh coverage
/home/edd/git/dang/.travis.yml:#  - ./run.sh coverage
/home/edd/git/digest/.travis.yml:  - ./run.sh coverage
/home/edd/git/n2r/.travis.yml:## no tests, coverage
/home/edd/git/nanotime-lsilvest/.travis.yml:  - ./run.sh coverage
/home/edd/git/nanotime/.travis.yml:  - ./run.sh coverage
/home/edd/git/random/.travis.yml:#  - ./run.sh coverage
/home/edd/git/rapidatetime/.travis.yml:  - ./run.sh coverage
/home/edd/git/rapiserialize/.travis.yml:#  - ./run.sh coverage
/home/edd/git/rblpapi-alfred/.travis.yml:#  - ./run.sh coverage
/home/edd/git/rblpapi/.travis.yml:#  - ./run.sh coverage
/home/edd/git/rcppapt/.travis.yml:  - ./run.sh coverage
/home/edd/git/rcppcctz/.travis.yml:#  - ./run.sh coverage
/home/edd/git/rcppclassic/.travis.yml:#  - ./run.sh coverage
/home/edd/git/rcppeigen/.travis.yml:#  - ./run.sh coverage
/home/edd/git/rcppnloptexample/.travis.yml:  - ./run.sh coverage
/home/edd/git/rcppquantuccia/.travis.yml:#  - ./run.sh coverage
/home/edd/git/rcppsimdjson/.travis.yml:  - ./run.sh coverage
/home/edd/git/rcppspdlog/.travis.yml:#  - ./run.sh coverage
/home/edd/git/rcpptoml/.travis.yml:  - ./run.sh coverage
/home/edd/git/rcpp/.travis.yml:    - name: coverage
/home/edd/git/rfoaas/.travis.yml:  - ./run.sh coverage
/home/edd/git/rpushbullet/.travis.yml:  - if [[ "${TRAVIS_PULL_REQUEST}" == "false" ]]; then ./run.sh coverage; fi
/home/edd/git/tldbclr/.travis.yml:#  - if [[ "${TRAVIS_PULL_REQUEST}" == "false" ]]; then ./run.sh coverage; fi
/home/edd/git/ttdo/.travis.yml:  - ./run.sh coverage

Always good to have a local corpus of work to copy from ;-)

codecov[bot] commented 3 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@74ffa56). Click here to learn what that means. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #52   +/-   ##
=========================================
  Coverage          ?   84.15%           
=========================================
  Files             ?       18           
  Lines             ?     1218           
  Branches          ?        0           
=========================================
  Hits              ?     1025           
  Misses            ?      193           
  Partials          ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 74ffa56...31061e0. Read the comment docs.