eddelbuettel / rcppsimdjson

Rcpp Bindings for the 'simdjson' Header Library
116 stars 13 forks source link

Integer Support #15

Closed knapply closed 4 years ago

knapply commented 4 years ago

Addressing #13

Handling scalars was relatively simple, but I'm sure the approach I took to handle vectors (in anticipation of from_json() w/ simplify) needs some help.

eddelbuettel commented 4 years ago

Interesting approach. Would have to think about it. It's a thorny problem, and I had not see this triple pronged (normal int, big int, string) approach. Given that we do not what comes in this may be the correct approach. Not knowing what type comes out is weird though.

I did run a git rebase origin/master in a branch checked out for the PR and could push that back right into your branch (as PRs now allow this). Shall I do this? You are otherwise behind.

knapply commented 4 years ago

Interesting approach.

That's a nice way to put it. 🤣

Would have to think about it. It's a thorny problem, and I had not see this triple pronged (normal int, big int, string) approach. Given that we do not what comes in this may be the correct approach. Not knowing what type comes out is weird though.

What became Int64_R_Type::String began as a lossless fallback for users unable to leverage bit64::integer64.

AFAICT, using something like Int64_R_Type::String would be the only lossless solution for uint64_ts. That said, these seem to be by far the least likely type we'd have to address (I'm struggling to think of more than once instance where I've encountered them in the wild).

I may be missing a more creative path here. In my experience, these JSON big ints are (mostly?) for IDs/hashes. If there's a simpler way to enable reliable equality comparison for users, that may be "good enough".

I did run a git rebase origin/master in a branch checked out for the PR and could push that back right into your branch (as PRs now allow this). Shall I do this? You are otherwise behind.

Yes, please.

eddelbuettel commented 4 years ago

Ok, pushed. You may need to git reset --hard sha1OfACommitJustBefore so that you pull cleanly over it. (And as always whatever you do make sure you do not loose local work. No shame in keeping a local copy of the repo as rcppsimdjson-prev etc.)

It's good to come at this problem with fresh eyes. I also do not have a magic bullet for the uint64 problem. It's tricky.

knapply commented 4 years ago

@eddelbuettel I think it all worked. Thank you!

dcooley commented 4 years ago

I'm here

git checkout -b knapply-cran-dev integer-support
git pull https://github.com/knapply/rcppsimdjson.git cran-dev

But I'm getting a compile error no matching function for call to 'resolve_int64_vec' for these specific lines (50 - 64)

_["castable_vec"] = Rcpp::List::create(
          _["Double"] = resolve_int64_vec<Int64_R_Type::Double>(castable_vec),
          _["String"] = resolve_int64_vec<Int64_R_Type::String>(castable_vec),
          _["Integer64"] = resolve_int64_vec<Int64_R_Type::Integer64>(castable_vec)  //
          ),
      // `uncastable_vec` should NEVER be integers
      _["uncastable_vec"] = Rcpp::List::create(
          _["Double"] = resolve_int64_vec<Int64_R_Type::Double>(uncastable_vec),
          _["String"] = resolve_int64_vec<Int64_R_Type::String>(uncastable_vec),
          _["Integer64"] = resolve_int64_vec<Int64_R_Type::Integer64>(uncastable_vec)  //
          ),

      _["vec_int64_t"] = Rcpp::List::create(
          _["Double"] = resolve_int64_vec<Int64_R_Type::Double>(big_int_vec),
          _["String"] = resolve_int64_vec<Int64_R_Type::String>(big_int_vec),
          _["Integer64"] = resolve_int64_vec<Int64_R_Type::Integer64>(big_int_vec)  //
          ),
knapply commented 4 years ago

For reference, here's an example that uses it from the actually json stuff:

.query_json() ```cpp // [[Rcpp::export(.query_json)]] SEXP query_json(const Rcpp::String& x, const std::string& json_pointer, const int int64_T = 0) { using rcppsimdjson::parse_json::parse_element; using rcppsimdjson::utils::Int64_R_Type; const auto int64_type = static_cast(int64_T); // ^^^^^^^^^^^^^^^^^^^^^^ // pass down integer from R level and cast simdjson::dom::parser parser; #if __cplusplus >= 201703L auto [value, error] = parser.parse(x).at(json_pointer); #else auto init = parser.parse(simdjson::padded_string(x)).at(json_pointer); auto value = init.value(); auto error = init.error(); #endif if (error != simdjson::error_code::SUCCESS) { Rcpp::stop(simdjson::error_message(error)); } switch (int64_type) { // >>>>>>>>>>>>>>>>>>>> dispatch case Int64_R_Type::Double: return parse_element(value); case Int64_R_Type::String: return parse_element(value); case Int64_R_Type::Integer64: return parse_element(value); } return R_NilValue; } ```
knapply commented 4 years ago

@dcooley can you call sessionInfo() and send your compiler info? I may have touched something too recent on GCC (9.2.1 here).... (or the code is just not good).

dcooley commented 4 years ago
sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.0.0 tools_4.0.0  
gcc -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 11.0.3 (clang-1103.0.32.29)
Target: x86_64-apple-darwin19.4.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

And I currently don't have anything set in ~/.R/Makevars

eddelbuettel commented 4 years ago

It checks, and installs, fine for me but we made a castrosphic of the git timeline. I'll post it as an image which easier:

image

The commits labeled '7 weeks ago' are fine, the from earlier in the week and foobared. This is what master looks like with 'normal' branches from clean PRs:

image

Might be simplest to forget this branch, create a new one off master and cherry-pick the commits we want to it. I can do that.

But first we need to find my Dave can't build.

eddelbuettel commented 4 years ago

Oh, and I had not noticed that Travis CI is barfing too. That obviously needs attention too.

knapply commented 4 years ago

I unintentionally merged with another branch earlier (I still don't know how), but was able to revert here.

I don't think the "Current" Travis ever kicked over again after that, so I just hit restart and it seems fine.

I'm have trouble mapping the Apple Clang version to something on Linux to find what Dave's encountering. I am able to compile with Clang 6, which is the minimum simdjson supports.

* installing to library ‘/home/brendan/R/x86_64-pc-linux-gnu-library/4.0’
* installing *source* package ‘RcppSimdJson’ ...
** using staged installation
** setting up C++17
clang-6.0 -std=gnu++17 -I"/usr/share/R/include" -DNDEBUG  -I'/home/brendan/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include'   -DSIMDJSON_NO_COMPUTED_GOTO -I../inst/include -fopenmp -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-uZCG9P/r-base-4.0.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c RcppExports.cpp -o RcppExports.o
** libs
clang-6.0 -std=gnu++17 -I"/usr/share/R/include" -DNDEBUG  -I'/home/brendan/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include'   -DSIMDJSON_NO_COMPUTED_GOTO -I../inst/include -fopenmp -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-uZCG9P/r-base-4.0.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c rcppsimdjson_utils_check.cpp -o rcppsimdjson_utils_check.o
clang-6.0 -std=gnu++17 -I"/usr/share/R/include" -DNDEBUG  -I'/home/brendan/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include'   -DSIMDJSON_NO_COMPUTED_GOTO -I../inst/include -fopenmp -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-uZCG9P/r-base-4.0.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c simdjson_example.cpp -o simdjson_example.o
clang-6.0 -std=gnu++17 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o RcppSimdJson.so RcppExports.o rcppsimdjson_utils_check.o simdjson_example.o -fopenmp -L/usr/lib/R/lib -lR
installing to /home/brendan/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-rcppsimdjson/00new/RcppSimdJson/libs
** R
** demo
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (RcppSimdJson)

I'll keep poking around, but am I missing something here?

@dcooley Can you share the whole error log?

eddelbuettel commented 4 years ago

Maybe we need different issue tickets for different issues:

knapply commented 4 years ago

I'm not proposing anything. I'm just trying to diagnose.

I screwed things up by unintentionally merging a branch into this PR. I was able to revert it to the correct commit, or so I thought. I was trying to point you to this

image

I appear to be current with knapply/cran-dev, it is still bad.

Can you point me to where you're seeing this? I am seeing green across the board for the PR to your repo and for knapply/cran-dev besides the bad merge

Unless you think this is salvageable, maybe it's simpler to throw it out and start afresh? I really do appreciate your patience.

eddelbuettel commented 4 years ago

We are really conversation lines. Travis is now green, that is no longer an issue.

Might be worthwhile to clean-up re git.

dcooley commented 4 years ago

I think I've narrowed it down to

variable of non-literal type 'Rcpp::NumericVector' cannot be defined in a constexpr function

here

    case Int64_R_Type::Integer64: {
      const auto n = std::size(x);
      Rcpp::NumericVector out(n);
      std::memcpy(&(out[0]), &(x[0]), n * sizeof(double));
      out.attr("class") = "integer64";
      return out;
    }
eddelbuettel commented 4 years ago

Sorry what was the question for which that error message was the answer? ;-)

I do that memcpy of std::vector<int64_t> payload into a NumericVector quite regularly, and classing it as integer64 works fine (if you have bit64 loaded etc).

knapply commented 4 years ago

It's upset that I did it directly inside of a constexpr function (instead of just calling a separate function). Hey, "works on my machine"!!

I'm working a separate (clean) PR that should resolve it (I checked C++11-17 on a handful of GCC and Clang versions), and replace the git mess I made here.

I'll point you both to it ASAP so we can make sure all is well before doing anything with it.

dcooley commented 4 years ago

(sorry, deviated away from the git issues) I'm trying to trace my compile errors.


yeah, what he said

eddelbuettel commented 4 years ago

Here is a peace proposal.

Let's close this ~issue~ PR. We appear to discuss three different topics in here. That may work for you smart kids, but it leaves me confused. While we're at this, I suggest we make local copies of what we feel important but otherwise nix these PRs. I removed the old (long merged) branch, and I also removed the identical-to-master other branch.

We learned a lession, and it is cleaner to start from a clean slate. We sync with master, create a new branch and apply commits one by one. Let's pay attention to Travis builds too. We may even end up with a merged PR then that doesn't shred all the old commits.

(And yes, constexpr and run-time-driven SEXP expressions probably don't get along well. NumericVector may have had a chance but maybe something else happeened there...)

eddelbuettel commented 4 years ago

Oh, and we all have write credentials here. So a new branch can be here as well. Maybe be simpler than to merge from a branch off a fork.