eddelbuettel / rcppsimdjson

Rcpp Bindings for the 'simdjson' Header Library
117 stars 14 forks source link

Updating to simdjson 1.0 #70

Closed lemire closed 3 years ago

lemire commented 3 years ago

This PR should not be merged and released as is. But I wrote it to help fix the issue.

Ok. So what is going on in the transition to simdjson 0.9 that requires so many changes ?

Well. We were returning results in the form of an std::pair but we were not expecting people to use it as an std::pair... (although, to be fair, we sometimes did it ourselves) yet people did. And what happened was that people would just ignore the error field. They would then consume garbage, and complain that simdjson was producing garbage.

So we closed off the std::pair by making it a protected inheritance. Sadly, it breaks structured binding which was nice but never worked right with all C++ runtimes (libc did not like it).

This does not impact at all people who used our normal API. Want a double? Do double(element). But I understand that R does not want exceptions thrown? Well. Thankfully, we support both exception-based and no-exception usage.

What about rcppsimdjson? Well rcppsimdjson would do result.first to get the result and ignore the error code. This was, presumably, always safe because error conditions were otherwise checked.

We don't want you to continue doing that. Instead, you now must do result.value_unsafe().

Yes. It is ugly. But the use of value_unsafe() is meant to indicate that you are very much doing something unsafe. And that's what we want the code to show so that people are careful. The .first syntax does not have the same air of danger around it.

eddelbuettel commented 3 years ago

Thanks so much for working on this "for us". I am not sure if anyone hands out prices for most amazing upstream, but I would be compelled to nominate you ....

Now, while I am currently fighting other fires elsewhere, one brief comment:

But I understand that R does not want exceptions thrown?

No, in fact, every call generated by Rcpp contains wrapping glue code with a try/catch. We fetch exceptions, and turn them into R errors. So that is not the issue.

lemire commented 3 years ago

No, in fact, every call generated by Rcpp contains wrapping glue code with a try/catch. We fetch exceptions, and turn them into R errors. So that is not the issue.

Damn. It could allow for much code simplification... but it is too late for this PR as I started going all exceptionless.

Working to find my final bugs.

codecov[bot] commented 3 years ago

Codecov Report

Merging #70 (0870053) into master (de81b7c) will increase coverage by 0.00%. The diff coverage is 100.00%.

:exclamation: Current head 0870053 differs from pull request most recent head ef837d0. Consider uploading reports for the commit ef837d0 to get more accurate results Impacted file tree graph

@@           Coverage Diff           @@
##           master      #70   +/-   ##
=======================================
  Coverage   99.55%   99.55%           
=======================================
  Files          18       18           
  Lines        1336     1337    +1     
=======================================
+ Hits         1330     1331    +1     
  Misses          6        6           
Impacted Files Coverage Δ
inst/include/RcppSimdJson/deserialize.hpp 100.00% <100.00%> (ø)
...t/include/RcppSimdJson/deserialize/Type_Doctor.hpp 100.00% <100.00%> (ø)
...nst/include/RcppSimdJson/deserialize/dataframe.hpp 100.00% <100.00%> (ø)
inst/include/RcppSimdJson/deserialize/matrix.hpp 100.00% <100.00%> (ø)
inst/include/RcppSimdJson/deserialize/scalar.hpp 100.00% <100.00%> (ø)
inst/include/RcppSimdJson/deserialize/simplify.hpp 100.00% <100.00%> (ø)
inst/include/RcppSimdJson/deserialize/vector.hpp 100.00% <100.00%> (ø)
src/exported-utils.cpp 100.00% <100.00%> (ø)
src/simdjson_example.cpp 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update de81b7c...ef837d0. Read the comment docs.

lemire commented 3 years ago

Locally, this passes all my tests but it seems to fail in CI

I think that in many cases, the PR improves upon the existing code. Either it becomes clearer or the dangers are more evident.

Importantly, it follows more closely our recommended approach.

Feel free to drop this and rewrite it, I have no ego invested in it.

cc @jkeiser

eddelbuettel commented 3 years ago

I am not too worried about random segfaults in the CI ... because we are lazy and installing a few things (R packages we need) as binaries. Sometimes things just need a rebuild. I'll take a look later.

eddelbuettel commented 3 years ago

And the switch to the error status variable is not all that mortal -- @knapply and I will chat and see which style we like better and find more idiomatic.

Again, big big thank you for making time and working through this. It should not be far from here to the goal line....

lemire commented 3 years ago

You used to do this construction...

if (auto [result, parse_error] = something; !parse_error) {
}

It did not work under libc++ though @ldionne just fixed that.

We no longer allow it because we don't want to give people direct access to the value without enticing them to check the error... but you can do it like so...

if (simdjson::dom::element element; something.get(element) == SUCCESS) {
}

I would argue that the latter is just a clean. (I did not use this construction because I am old-style guy and I don't declare variables in if branches, but you could.)

Importantly, if are lazy and do...

simdjson::dom::element element; 
something.get(element)

It should complain that you are not checking the error condition. That's what we want. We want to bug people who fail to check their errors.

eddelbuettel commented 3 years ago

Well it bombs on my box too (Ubuntu 20.10). With the tinytest framework I can run most of the test files individually (after the updated package is installed, and some of those test file lack a library(RcppSimdJson)) and of those that run a number die on segfault. As does the initial R CMD check on running the examples. Maybe something got chopped up with the error code and exceptions, maybe it ise something else. Will need to dig...

lemire commented 3 years ago

@eddelbuettel My guess is a typo somewhere. I had several. I am probably missing at least one.

My guess is that you guys are probably better equipped to track down the offending code.

Know that I am not at all opposed to the idea of finishing the PR and making it work.

eddelbuettel commented 3 years ago

Still getting segfaults even when it builds fine. If you do R -d gdb then say run<return> and in R library(RcppSimdJson); example(fparse) it blows up in deserialization, as it does in some of this test file.

Call backtrace does not scream anything immediately useful at me:

> library(RcppSimdJson)                                                                                  
> example(fparse)                                                                                        

fparse> # simple parsing ============================================================
fparse> json_string <- '{"a":[[1,null,3.0],["a","b",true],[10000000000,2,3]]}'

fparse> fparse(json_string)                                                                              

Thread 1 "R" received signal SIGSEGV, Segmentation fault.  
0x00007fff94636556 in SEXPREC* rcppsimdjson::deserialize::matrix::build_matrix_mixed<16>(simdjson::dom::array, unsigned long) () from /usr/local/lib/R/site-library/RcppSimdJson/libs/RcppSimdJson.so
(gdb) backtrace                  
#0  0x00007fff94636556 in SEXPREC* rcppsimdjson::deserialize::matrix::build_matrix_mixed<16>(simdjson::dom::array, unsigned long) () from /usr/local/lib/R/site-library/RcppSimdJson/libs/RcppSimdJson.so
#1  0x00007fff946606d2 in SEXPREC* rcppsimdjson::deserialize::simplify_element<(rcppsimdjson::deserialize::Type_Policy)0, (rcppsimdjson::utils::Int64_R_Type)0, (rcppsimdjson::deserialize::Simplify_To)0>(simdjson
::dom::element, SEXPREC*, SEXPREC*, SEXPREC*) () from /usr/local/lib/R/site-library/RcppSimdJson/libs/RcppSimdJson.so
#2  0x00007fff94661ef1 in SEXPREC* rcppsimdjson::deserialize::simplify_object<(rcppsimdjson::deserialize::Type_Policy)0, (rcppsimdjson::utils::Int64_R_Type)0, (rcppsimdjson::deserialize::Simplify_To)0>(simdjson:
:dom::object, SEXPREC*, SEXPREC*, SEXPREC*) () from /usr/local/lib/R/site-library/RcppSimdJson/libs/RcppSimdJson.so
#3  0x00007fff94660527 in SEXPREC* rcppsimdjson::deserialize::simplify_element<(rcppsimdjson::deserialize::Type_Policy)0, (rcppsimdjson::utils::Int64_R_Type)0, (rcppsimdjson::deserialize::Simplify_To)0>(simdjson
::dom::element, SEXPREC*, SEXPREC*, SEXPREC*) () from /usr/local/lib/R/site-library/RcppSimdJson/libs/RcppSimdJson.so
#4  0x00007fff9460753c in ?? () from /usr/local/lib/R/site-library/RcppSimdJson/libs/RcppSimdJson.so
#5  0x00007fff9460e773 in deserialize(SEXPREC*, SEXPREC*, SEXPREC*, SEXPREC*, SEXPREC*, bool, SEXPREC*, bool, SEXPREC*, int, int, int) () from /usr/local/lib/R/site-library/RcppSimdJson/libs/RcppSimdJson.so
#6  0x00007fff945dd103 in _RcppSimdJson_deserialize () from /usr/local/lib/R/site-library/RcppSimdJson/libs/RcppSimdJson.so
#7  0x00007ffff7c0ef76 in ?? () from /usr/lib/R/lib/libR.so
## many more lines omitted
eddelbuettel commented 3 years ago

So to take the concrete example, example(fparse) blows up here, apparently on the dom::array access:

https://github.com/lemire/rcppsimdjson/blob/a65c00d2e5709dfe754d1871d6d9bd554eb0fa3e/inst/include/RcppSimdJson/deserialize/matrix.hpp#L212

Via the old-fashioned method of a print statement, I make it to a print just above but not below. Dimensions seem fine (3 rows, 3 cols), but for i==0 we bonk. Any hunch, @lemire?

lemire commented 3 years ago

@eddelbuettel I am sure it is something stupid that I did. I will have a look tomorrow morning. It is 10 pm here and Jack Daniels is calling.

lemire commented 3 years ago

@eddelbuettel I need to throw better tooling at this problem. There is a lot of code in there, most of which I am only vaguely familiar with.

Normally, I would just use sanitizers. I read the R documentation, I searched on the Internet... and I can't find a way to do it...

Furthermore, it really looks like an heisenbug. I can influence where things fail by adding printouts. So something gets the wrong address or the wrong cast. Now with sanitizers, I should pick this up right away.

You (@eddelbuettel) state on stackoverflow that one can modify ~/.R/Makevars and that this gets picked up automagically. Someone comments that he was afraid that you would say that. You dismiss him, but I have the same problem.

Here is what I have currently...

$ more ~/.R/Makevars 
CXXFLAGS = -fsanitize=undefined,address -fno-omit-frame-pointer
CFLAGS = -fsanitize=undefined,address -fno-omit-frame-pointer

It is not picked up. In fact, I can do this...

$ more ~/.R/Makevars 
CC = go to hell
CXX = go to hell 2
CXXFLAGS = -unvalidaerewrewflag
CFLAGS = -iahteate

and it is not used at all.

For context, I just call

./run.sh run_tests

It is possible that your script is overriding this default or else I am not understand.

eddelbuettel commented 3 years ago

I know there is a lot of complexity here, but, frankly, you could just ask. We're here, and this medium is quick. Nobody expects you to go off and re-figure all of the things out by yourself.

I created the initial Docker containers for ASAN and UBSAN as CRAN / Brian Ripley have some cryptic instructions (see older posts on my blog). As my builds are not regularly updated, these days the best bet is the 'sumo' style aggregate container by @wch here: https://github.com/wch/r-debug You need those as R itself has to be built with sanitizers.

Alternatively, one can also use the builder service here https://builder.r-hub.io/ which has ASAN and Valgrind options (and the CRAN package rhub lets upload locally). I can look into any and all of that in a day or two (or ideally, an evening).

lemire commented 3 years ago

I know there is a lot of complexity here, but, frankly, you could just ask.

I thought that's what I did? :-) I am not at all against asking for help... but I want to explain what I tried and what I read before I do.

eddelbuettel commented 3 years ago

Point token. I am simply still sore from the missed opportunity to show you exception handling before you went off firmly assuming it does not exist. Stuff happens. Use of valgrind and/or asan here is a very good suggestion.

lemire commented 3 years ago

@eddelbuettel I am reengineering with the knowledge that exception handling is supported. This will make the code cleaner.

Next comment will tell you what the bug was. Wait for it.

lemire commented 3 years ago

I won't assume that there are no more bugs, but I ended up fixing one nagging one just by reading more carefully my own code. I had the following...

    for (auto element : array) {
        simdjson::dom::array sub_array;
        if(!element.get(sub_array)) { // <==== The bug is here
            return std::nullopt;
        }
        matrix_doctor.update(
            Type_Doctor<type_policy, int64_opt>(sub_array));
        n_cols.insert(std::size(sub_array));

        if (std::size(n_cols) > 1 || !matrix_doctor.is_vectorizable()) {
            return std::nullopt;
        }
    }

This was saying... if this is not an array, then cast it as an array...

Here is the correct code...

    for (auto element : array) {
        simdjson::dom::array sub_array;
        if(element.get(sub_array) != simdjson::SUCCESS) {
            return std::nullopt;
        }
        matrix_doctor.update(
            Type_Doctor<type_policy, int64_opt>(sub_array));
        n_cols.insert(std::size(sub_array));

        if (std::size(n_cols) > 1 || !matrix_doctor.is_vectorizable()) {
            return std::nullopt;
        }
    }

Elsewhere, I am now using in this PR casts and other constructions that may throw. They are embedded in noexcept methods so it is unsafe. I could have removed the noexcept but... This PR is for demonstration/discussion... please do not merge as-is. It is a proof of concept. Review and reengineer to your taste.

In early commits, I described all of the options you have... basically, suppose that you are given an element and you want to cost it to a double... you can do...

double(element);

This might throw but only if you got the type wrong, so if you checked the type first, this should be fine.

If you don't want to throw, then do...

double value; // could put in the next line as part the if clause if you have fancy C++
if(element.get(value) == simdjson::SUCCESS) {
  // "value" contains what you need... 
} else {
 // do something
}

You can instrument this to support default values like so...

double value;
value = (element.get(value) == simdjson::SUCCESS) & value :  default_value;

The == simdjson::SUCCESS is verbose and there is the usual auto cast to a bool but I just messed it up so that the current code always explicitly compares against SUCCESS (my brain likes this approach better).

eddelbuettel commented 3 years ago

I must have been a little asleep at the wheel, or overwhelmed from the earlier flood of post :wink: , because I seem to have missed that the PR is now good :) As a sign of repentence I did first update my copy of the wch1/r-debug container and ran this with RDsan (which still fails a few tests) but on a normal R(-release) session it passes, as it does with R-devel.

So shall we merge now?

lemire commented 3 years ago

@eddelbuettel To recap, I did not need any tooling and I only had made a logical error (which was entirely my fault) which lead to memory corruption. I ended up catching it by reading my code in a rested mood. It was a tiny error.

I have commented out all of the part of the changes you should review. I am mostly concerned about exception handling. To simplify your code, I use casts, and casts throw in simdjson, yet they are sometimes used as part of noexcept methods.

I don't think there is anything wrong with any of this, but you should know.

lemire commented 3 years ago

In effect, I am basically urging you to review the code critically. I think it is logically correct, but I might be breaking conventions that I am unaware of.

jkeiser commented 3 years ago

I cannot wait to see what happens when this plugin is updated for ondemand :) It's one of the places it's most likely to shine, since you let people write actual C++ code.

lemire commented 3 years ago

@jkeiser Actually, I can probably do the On Demand conversion in a week-end now that I know more about the code but I think we might face non-trivial (but solvable) obstacles that might cause me to swear a bit.

For example, the code here takes an element and checks that it is "like a matrix", if it is then it rewinds and does the conversion. So you need to be able to rewind. I don't think we have such examples in our documentation. I think we only do "move forward" examples.

I'll post this as an issue upstream (at simdjson): I think we should prototype an On Demand version of rcppsimdjson before we release 1.0. That is, I don't think we should do the full rcppsimdjson work, but we should have a prototype showing that it is possible to do so.

eddelbuettel commented 3 years ago

I cannot wait to see what happens when this plugin is updated for ondemand :) It's one of the places it's most likely to shine, since you let people write actual C++ code.

To be fair I think most of the use if actually 'just' a canned JSON parser that ... happens to be the fastest one around. I am not sure I know of a package using the API we re-export though one could I suppose.

lemire commented 3 years ago

@eddelbuettel Even so, we should be able to speed up rcppsimdjson in some cases (as always, mileage will vary).

knapply commented 3 years ago

I'm extremely late to the party here and have not been able to keep an eye on this.

Unless I'm missing something, the "unsafe" usage (accessing the std::pair::first) is not actually unsafe (in this context) because we already check literally every element before accessing them -- we have to diagnose the structure we're returning to R (size, type, shape, potentially null, etc.) before we access any of the values so that we can allocate the R structures once, then just populate them.

The "safe" API forces a redundant check on everything if simdjson exceptions are enabled (we never actually turned them off).

I'm still going through simdjson's updates, but it looks like we should now be using simdjson_result<T>::value_unsafe()to skip redundant checks without disabling simdjson's exceptions.

To add some more context to jog memories here (and give myself a sanity check)...

Here's the relevant simdjson code that will perform redundant checks: https://github.com/simdjson/simdjson/blob/95b4870e20be5f97d9dcf63b23b1c6f520c366c1/include/simdjson/error-inl.h#L68-L88

#if SIMDJSON_EXCEPTIONS

template<typename T>
simdjson_really_inline T& simdjson_result_base<T>::value() & noexcept(false) {
  if (error()) { throw simdjson_error(error()); }
  return this->first;
}

template<typename T>
simdjson_really_inline T&& simdjson_result_base<T>::value() && noexcept(false) {
  return std::forward<simdjson_result_base<T>>(*this).take_value();
}

template<typename T>
simdjson_really_inline T&& simdjson_result_base<T>::take_value() && noexcept(false) {
  if (error()) { throw simdjson_error(error()); }
  return std::forward<T>(this->first);
}

template<typename T>
simdjson_really_inline simdjson_result_base<T>::operator T&&() && noexcept(false) {
  return std::forward<simdjson_result_base<T>>(*this).take_value();
}

#endif // SIMDJSON_EXCEPTIONS

We have already checked the types to diagnose and build the R structure into which we're inserting the parsed data, so the if (error()) { throw simdjson_error(error()); } is wasted effort.

lemire commented 3 years ago

Unless I'm missing something, the "unsafe" usage (accessing the std::pair::first) is not actually unsafe (in this context) because we already check literally every element before accessing them -- we have to diagnose the structure we're returning to R (size, type, shape, potentially null, etc.) before we access any of the values so that we can allocate the R structures once, then just populate them.

You might have been using it in a safe manner, but we found that people in the wild would just skip error checking and just read first. When things failed, they would blame simdjson.

We don't want simdjson user to have access to first so we removed access in the library. You can still do it effectively, but you have to call value_unsafe(). So you could just do a search and replace for first and replace with value_unsafe(), but I tried to play with the code to simplify it a bit.

It is possible that I introduced some overhead while doing so.

It should be possible to check it out by just running benchmarks. I am not the best person to run these benchmarks.

lemire commented 3 years ago

@knapply Without running benchmarks, I would not assume that my changes introduce overhead but if it does, then you are right that value_unsafe() might be the right thing. For reasons that should be obvious, given that I do not know this code super well, I was biased toward more safety rather than less.

If you have benchmarks, then I think we should run them, before and after this PR. There should be, practically, no change. If there are, then I have introduced overhead and it should be fixed.

Note that I did not recommend that this PR be merged right away. This was just to help with the upgrade.

eddelbuettel commented 3 years ago

Maybe we can proceed with the 'recommended' safe accessor and then do subsequent testing and benchmarking ?

lemire commented 3 years ago

@knapply Has legitimate concerns about the performance impact of this PR.

Is it difficult to run benchmarks before and after?

I find that hard numbers often help move discussions along.

eddelbuettel commented 3 years ago

Is it difficult to run benchmarks before and after?

It is not, but I prefer to nail any many boards to the floor first.

Right now we have changes from you upstream, us in the package, R by itself (now 4.0.5, "soon" 4.1.0), ...

lemire commented 3 years ago

In any case, @knapply knows this code better than I do, so I would invite him to flip around the changes I made based on his judgment. It is not hard to replace the casts by value_unsafe(). He could probably do so in less than one hour.

knapply commented 3 years ago

@lemire

You might have been using it in a safe manner, but we found that people in the wild would just skip error checking and just read first. When things failed, they would blame simdjson.

That's extremely frustrating that folks were blaming simdjson for this, so I understand the decision here.

would invite him to flip around the changes I made based on his judgment

I'm working on it... I just haven't had a system set up to write any code until this past weekend (and am stumbling around a Mac for the first time, so I'm slow!)

FWIW, the original design here was still when I knew just enough C++ to be dangerous (that probably hasn't changed much) -- and well before I had maintained any C++ code collaboratively. I've been thinking about an eventual overhaul after some lessons learned on another project (for example, there's a wildly more cache-friendly approach for parsing JSON to data frames that should have been obvious), but if there are potential improvements you noticed while digging around in here, please feel free to call them out.

lemire commented 3 years ago

The code was fine. The mistakes I made were all my mistakes (typos, really).

I did not swear at the code. It was relatively easy and had I been more careful, I could have done it very fast. I just ended up wasting time because I was careless once or twice.

knapply commented 3 years ago

Cool.

Before I take a second pass at it, is the use of element.get<bool>() versus element.get_bool() purely stylistic? Or am I missing something -- your comments seem to suggest we should be preferring element.get_bool() and friends instead.

lemire commented 3 years ago

Before I take a second pass at it, is the use of element.get() versus element.get_bool() purely stylistic? Or am I missing something -- your comments seem to suggest we should be preferring element.get_bool() and friends instead.

So we are discouraging element.get<type>(). Again it is a usability issue. People say "oh! my library has a CString type, so I will do ...element.get<CString>() and then it fails (with an infamous C++ error). The thing is, element.get<type>() was only meant to work for a select few types (and you needed to read the documentation to find out the allowed type), but people commonly do not read the documentation and they rely on auto-completion instead. So element.get<type>() is terrible from this point of view.

We had a lot of complains about people finding our API hard to use, and it seems that these sort of things trip people up. We are trying to make it so that the API is really hard to misuse.

There is another reason why we are switching away. Given the choice between element.get<uint64_t>() and element.get_uint64(), the latter is simpler, shorter, and prettier.

Let us be honest here: templates are not the prettiest thing about C++.

lemire commented 3 years ago

@knapply Did I answer your question in a satisfactory manner?

eddelbuettel commented 3 years ago

@knapply How are things? Did life stabilize a little after moves and all? Do you think you will be able to take a glance at this?

lemire commented 3 years ago

I am also still eager to answer any question and to make requested changes (if needed). It is probably a good idea to update since our (conventional/DOM) API is now quite stable and will not change.

Note that we will probably, at least as a prototype, build an On-Demand wrapper.

lemire commented 3 years ago

I am doing further work with @NicolasJiaxin on the On Demand front-end.

The debugging process is very slow because my current approach is something like this...

 R CMD build --no-build-vignettes --no-manual . && Rscript tabar.R

where tabar.R builds the library and run some tests.

That part is fine. Recompiling each time is ok. What I find really annoying is the   R CMD build --no-build-vignettes --no-manual . step which takes about 5 minutes.

Obviously, there must be a smarter approach where I skip the R CMD build --no-build-vignettes --no-manual . step or make it massively faster (just rebuilding the tar would be quick)... but I can't find documentation on this issue. And I am not sure what R CMD build does that would take 5 minutes.

Any hint?

eddelbuettel commented 3 years ago

Yes, I hear you on this. The official way is clearly to go via a .tar.gz from R CMD build and then R CMD check. I do this myself (see below), and I am sometimes grumpy at myself for it as C++ has a cost we are all aware of. Now, I have been using ccache as a front-end since "forever" -- and it helps quite a lot. You need a few lines in ~/.ccache/ccache.conf because of R idiosyncracies:

edd@rob:~$ cat ~/.ccache/ccache.conf
max_size = 10.0G
# important for R CMD INSTALL *.tar.gz as tarballs are expanded freshly -> fresh ctime
sloppiness = include_file_ctime
# also important as the (temp.) directory name will differ
hash_dir = false
edd@rob:~$ 

Besides that, I built little helper scripts via the littler (i.e. /usr/bin/r or /usr/local/bin/r) frontend we built 15 years ago (and which beat Rscript by a few months. [On a macOS with its inexplicable choice of ignoring case use as /usr/local/bin/lr is recommended.] So my muscle memory is often build.r or build.r -f ("fast mode") followed by rcc.r to trigger a check.

When I know I am iterating within a package I also often just do install.r (basically refreshing the shared library only) followed by a command-line test or running of one or more unit test files. In other words, my workflow never entails devtools (and I may well be approximately the last person on the planet doing it this way, but I a) did it before they came along and b) remain sceptical of their attempts to load/unload DLLs which they too admit when pressed is imperfect...

There may be other tricks. I would be all ears for better ones too...

lemire commented 3 years ago

@eddelbuettel That's useful.

eddelbuettel commented 3 years ago

Glad to hear. I would love to hear from fresh eyes what still sucks and which tricks work. Might be worthy of quick blog post or even a paper-let ("ten tips to avoid loosing your hair when developing R packages").

eddelbuettel commented 3 years ago

PS To hook up ccache use ~/.R/Makevars with, say, this or variants thereof

## edd 03 Mar 2009                 emacs please make this a -*- mode: Makefile; -*-

PEDANTIC=-pedantic
#XTRAFLAGS=-Wno-deprecated-declarations -Wno-parentheses -Wno-ignored-attributes -Wno-unused-function

## for C code
CFLAGS=               -g -O3 -Wall -pipe $(PEDANTIC) $(XTRAFLAGS) -std=gnu99

## for C++ and C++11 code
CXXFLAGS=               -g -O3 -Wall -pipe $(PEDANTIC) $(XTRAFLAGS)
CXX1XFLAGS=     -g -O3 -Wall -pipe $(PEDANTIC) $(XTRAFLAGS)
CXX11FLAGS=     -g -O3 -Wall -pipe $(PEDANTIC) $(XTRAFLAGS)
CXX14FLAGS=     -g -O3 -Wall -pipe $(PEDANTIC) $(XTRAFLAGS)
CXX17FLAGS=     -g -O3 -Wall -pipe $(PEDANTIC) $(XTRAFLAGS)

FLAGS=-Wall -O3 -g -pipe $(PEDANTIC) $(XTRAFLAGS)

## for Fortran code
FFLAGS=-O3 -g0 -Wall -pipe
## for Fortran 95 code
FCFLAGS=-O3 -g0 -Wall -pipe

#VER=-4.9
VER=
CCACHE=ccache
CC=$(CCACHE) gcc$(VER)
CXX=$(CCACHE) g++$(VER)
CXX11=$(CCACHE) g++$(VER) #-std=c++11
CXX14=$(CCACHE) g++$(VER) #-std=c++14
CXX17=$(CCACHE) g++$(VER) #-std=c++17

SHLIB_CXXLD=$(CCACHE) g++$(VER)

STRIP=-Wl,-S
SHLIB_CXXLDFLAGS = $(STRIP) -shared
SHLIB_CXX11LDFLAGS = $(STRIP) -shared
SHLIB_CXX14LDFLAGS = $(STRIP) -shared
SHLIB_CXX17LDFLAGS = $(STRIP) -shared
SHLIB_FCLDFLAGS = $(STRIP) -shared
SHLIB_LDFLAGS = $(STRIP) -shared

FC=$(CCACHE) gfortran
F77=$(CCACHE) gfortran
F95=$(CCACHE) gfortran
lemire commented 3 years ago

@eddelbuettel

"ten tips to avoid loosing your hair when developing R packages"

I would not write such a thing alone, but if you are ever interested in teaming up for a paper-let, I am game.

(I am aware you have written extensively on the topic.)

eddelbuettel commented 3 years ago

Yes, it was meant as an invitation / proposal for such a PLOS One (or whereever people drop such '10 Things I hate about things' papers). I have not actually written any published ones; I dropped one on arXiv and left it there for now. Life is short, and maybe we won't get beyond a blog post but this would have merit (if we ever find the switch to enable 36 hours days....)

lemire commented 3 years ago

Meanwhile, we are making progress on the On Demand wrapper. :-)

eddelbuettel commented 3 years ago

I just rebased this to master given how we were discussing in #75 that this may be a merge candidate too.

lemire commented 3 years ago

@eddelbuettel Yes. It might be a safer choice in the sense that it does not lead to a performance regression. It might be viable to see the On Demand wrapper as "future work" in the sense that it should be merged after the performance regression has been addressed.

Nicolas cannot be expected to do this right now. But it is good future work that we could address next summer? :-)

eddelbuettel commented 3 years ago

Ok will likely be merging, and I will then wear a fool's cap for day or longer as I had completely missed that this was sitting here, ready. I was still hung up under the earlier error.

We can always support on-demand as an option, or in a branch.

@knapply Can you think of a reason not to merge?