Widen int to 64-bit signed integer in generated code

alashworth commented 5 years ago

Issue by bob-carpenter Monday Aug 14, 2017 at 12:34 GMT Originally opened as https://github.com/stan-dev/stan/issues/2381

Summary:

Replace all the uses of int in the generated code with long (see below).

Description:

The current restriction to integers is hobbling our ability to do things like negative binomial or even Poisson random variates and to represent some inputs.

Care will have to be taken to make sure 32-bit architectures use 64-bit integers here. I'm not sure where we're at with Windows on this issue. If all our potential platforms do not treat long as 64 bits, we'll need to use 64-bit specific types. In C++11, we can use int_64_t, but we could also use the Boost equivalent, though that's seriously clunky if we use it everywhere compared to the language standard.

Additional Information:

All of the integer argument functions in in stan-dev/math should also be updated to accept long inputs.

Current Version:

v2.16.0

alashworth commented 5 years ago

Comment by bgoodri Sunday Oct 22, 2017 at 18:06 GMT

I almost have Stan Math converted in a safe-ish way (On a Mac, you have to put '' after each -i in sed)

git checkout -b feature/int_64_t
git grep -F -l '(int' stan | xargs sed -i 's/(int/(int64_t/g'
git grep -F -l 'long int' stan | xargs sed -i 's/long int/int64_t/g'
git grep -F -l 'inline int' stan | xargs sed -i 's/inline int/inline int64_t/g'
git grep -F -l '<int' stan | xargs sed -i 's/<int/<int64_t/g'
git grep -F -l '<long,' stan | xargs sed -i 's/<long,/<int64_t,/g'
git grep -F -l 'int>' stan | xargs sed -i 's/int>/int64_t>/g'
git grep -F -l 'int&' stan | xargs sed -i 's/int\&/int64_t\&/g'
git grep -F -l ' int ' stan | xargs sed -i 's/ int / int64_t /g'
sed -i 's/var(long x) : /var(int x) : /g' stan/math/rev/core/var.hpp

# Fix overzealous replacements
git grep -F -l 'unsigned int64_t' stan | xargs sed -i 's/unsigned int64_t/unsigned int/g'
sed -i 's/int64_t/int/g' stan/math/rev/core/operator_unary_decrement.hpp
sed -i 's/int64_t/int/g' stan/math/rev/core/operator_unary_increment.hpp
sed -i 's/int64_t/int/g' stan/math/fwd/core/fvar.hpp
sed -i 's/int64_t/int/g' stan/math/rev/scal/fun/boost_fpclassify.hpp
sed -i 's/int64_t/int/g' stan/math/rev/mat/fun/Eigen_NumTraits.hpp

sed -i 's/int64_t/int/g' stan/math/rev/mat/fun/cholesky_decompose.hpp
sed -i 's/(int64_t/(int/g' stan/math/rev/mat/functor/cvodes_utils.hpp

git grep -F -l "template" | xargs sed -i -e '/template/ s/int64_t/int/g'
sed -i 's/int64_t/int/g' stan/math/prim/mat/fun/mdivide_left_tri.hpp
sed -i 's/int64_t/int/g' stan/math/prim/mat/fun/mdivide_right_tri.hpp
sed -i 's/int64_t/int/g' stan/math/prim/mat/fun/trace_gen_inv_quad_form_ldlt.hpp
sed -i 's/int64_t/int/g' stan/math/rev/mat/fun/trace_gen_inv_quad_form_ldlt.hpp
sed -i 's/int64_t/int/g' stan/math/rev/mat/fun/trace_gen_quad_form.hpp
sed -i 's/int64_t/int/g' stan/math/prim/mat/err/check_matching_dims.hpp
sed -i 's/int64_t/int/g' stan/math/prim/mat/fun/matrix_exp_pade.hpp
sed -i 's/int64_t/int/g' stan/math/prim/mat/fun/MatrixExponential.h
sed -i 's/static_cast<int64_t>/static_cast<int>/g' stan/math/rev/mat/functor/integrate_ode_bdf.hpp
git checkout stan/math/rev/mat/functor/cvodes_ode_data.hpp
sed -i 's/<int>/<int64_t>/g' stan/math/rev/mat/functor/cvodes_ode_data.hpp

# These have `inline` and `int` on separate lines as the return type
sed -i 's/inline/inline int64_t/g' stan/math/fwd/scal/fun/is_inf.hpp
sed -i '/int$/d' stan/math/fwd/scal/fun/is_inf.hpp
sed -i 's/inline/inline int64_t/g' stan/math/fwd/scal/fun/is_nan.hpp
sed -i '/int$/d' stan/math/fwd/scal/fun/is_nan.hpp
sed -i 's/inline/inline int64_t/g' stan/math/prim/scal/fun/logical_lt.hpp
sed -i '/int$/d' stan/math/prim/scal/fun/logical_lt.hpp

# Remove now duplicative stuff
git checkout stan/math/prim/scal/meta/ad_promotable.hpp
sed -i 's/A long may be promoted to a double./An int64_t may be promoted to a double/' stan/math/prim/scal/meta/ad_promotable.hpp
sed -i 's/<long, double>/<int64_t, double>/' stan/math/prim/scal/meta/ad_promotable.hpp

# Header stuff
git grep -F -l "int64_t" | xargs sed -i 's/^namespace stan/#include <cstdint>\n\nnamespace stan/'
sed -i 's/#define STAN_MATH_PRIM_MAT_FUN_EIGEN_HPP/#define STAN_MATH_PRIM_MAT_FUN_EIGEN_HPP\n#define EIGEN_DEFAULT_DENSE_INDEX_TYPE int64_t/' stan/math/prim/mat/fun/Eigen.hpp
sed -i '/Eigen.hpp/d' stan/math/prim/mat.hpp
sed -i 's/#define STAN_MATH_PRIM_MAT_HPP/#define STAN_MATH_PRIM_MAT_HPP\n\n#include <stan\/math\/prim\/mat\/fun\/Eigen.hpp>/' stan/math/prim/mat.hpp
sed -i 's/#define STAN_MATH_REV_MAT_HPP/#define STAN_MATH_REV_MAT_HPP\n\n#include <stan\/math\/prim\/mat\/fun\/Eigen.hpp>/' stan/math/rev/mat.hpp
sed -i 's/#define STAN_MATH_FWD_MAT_HPP/#define STAN_MATH_FWD_MAT_HPP\n\n#include <stan\/math\/prim\/mat\/fun\/Eigen.hpp>/' stan/math/fwd/mat.hpp

# tests
git grep -F -l '<int>' test | xargs sed -i 's/<int>/<int64_t>/g'
git grep -F -l "static double apply_base(int x) {" test/ | xargs sed -i 's/static double apply_base(int x) {/static double apply_base(int64_t x) {/'

The remaining occurrences of the "word" int can be found via

git grep -F -n -w "int" stan

alashworth commented 5 years ago

Comment by bob-carpenter Sunday Oct 22, 2017 at 23:49 GMT

Wow, that's awesome. I think we should be able to patch this up pretty easily. Not sure how to verify that the tests do what we want, but I suppose we should just trust our unit and integration tests at this point. @syclik ---you've done a lot of this before---any ideas?

We'll have to cleanup the residuals---assigning an int64_t to an int will be just fine by C++, but we want to eliminate all of that. So I'm surprised Eigen is balking at all here.

I'm pretty sure Eigen uses ptrdiff_t for indexing by default, but that's one of those platform-dependent types, as opposed to int64_t (to which it resolves on most modern platforms). We might have some badly behaved uses of Eigen that the default won't catch---can't we define that in the makefile?

alashworth commented 5 years ago

Comment by syclik Monday Oct 23, 2017 at 12:53 GMT

Yeah, we should trust our tests on this one.

This sort of thing takes a lot of time and energy. The only real suggestion I have is if it's possible to do it in chunks, commit to your branch often. Most of the effort will have to be after you've made changes and really getting to the edge cases. Test often. Write and add new tests if it helps. I use every roll I know how to use: diff3, emacs, GitHub's diff, git status. I tend to review every change at least a handful of times before submitting a pull request--an even then I miss things, but at that point you have help from the rest of the developers.

On Oct 22, 2017, at 7:49 PM, Bob Carpenter notifications@github.com wrote:

Wow, that's awesome. I think we should be able to patch this up pretty easily. Not sure how to verify that the tests do what we want, but I suppose we should just trust our unit and integration tests at this point. @syclik ---you've done a lot of this before---any ideas?

We'll have to cleanup the residuals---assigning an int64_t to an int will be just fine by C++, but we want to eliminate all of that. So I'm surprised Eigen is balking at all here.

I'm pretty sure Eigen uses ptrdiff_t for indexing by default, but that's one of those platform-dependent types, as opposed to int64_t (to which it resolves on most modern platforms). We might have some badly behaved uses of Eigen that the default won't catch---can't we define that in the makefile?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

alashworth commented 5 years ago

Comment by bgoodri Wednesday Oct 25, 2017 at 21:51 GMT

Okay, I can trigger this outside of Stan by trying to compile

#define EIGEN_DEFAULT_DENSE_INDEX_TYPE int64_t

#include <Eigen/Dense>
#include <iostream>

template <int64_t R, int64_t C>
inline double dot_self(const Eigen::Matrix<double, R, C>& v) {
  return v.squaredNorm();
}

int main() {
  Eigen::VectorXd x(3);
  x.setRandom();
  std::cout << "norm = " << dot_self(x) << std::endl;
  return 0;
}

I get:

$ g++ -O0 -g -I/usr/include/eigen3 -o eigen_test.o eigen_test.cpp
eigen_test.cpp: In function ‘int main()’:
eigen_test.cpp:14:39: error: no matching function for call to ‘dot_self(Eigen::VectorXd&)’
   std::cout << "norm = " << dot_self(x) << std::endl;
                                       ^
eigen_test.cpp:7:15: note: candidate: template<long int R, long int C> double dot_self(const Eigen::Matrix<double, R, C>&)
 inline double dot_self(const Eigen::Matrix<double, R, C>& v) {
               ^
eigen_test.cpp:7:15: note:   template argument deduction/substitution failed:
eigen_test.cpp:14:39: note:   mismatched types ‘long int’ and ‘int’
   std::cout << "norm = " << dot_self(x) << std::endl;
                                       ^
eigen_test.cpp:14:39: note:   ‘Eigen::VectorXd {aka Eigen::Matrix<double, -1, 1>}’ is not derived from ‘const Eigen::Matrix<double, R, C>’

alashworth commented 5 years ago

Comment by bgoodri Tuesday Oct 31, 2017 at 14:58 GMT

Upon further review, the problem seems to be although inserting

#define EIGEN_DEFAULT_DENSE_INDEX_TYPE int64_t

allows the rows and / or columns of a dynamically sized Eigen::Matrix to be int64_t, the constructor

Matrix<typename Scalar, int RowsAtCompileTime, int ColsAtCompileTime>

is hard-coded for ints and you cannot implicitly cast an int64_t to an int. Of course, RowsAtCompileTime and ColsAtCompileTime only take the values -1 or 1 in Stan, so I guess the thing to do is not change from int to int64_t in those templates?

alashworth commented 5 years ago

Comment by bob-carpenter Tuesday Oct 31, 2017 at 21:27 GMT

Yes, it sounds like the template types should remain int as you wrote in the second code block above. I hadn't even thought about casting in template parameter type matching.

alashworth commented 5 years ago

Comment by bgoodri Wednesday Nov 01, 2017 at 17:57 GMT

This is almost working now. Specifically, make test-headers works. There are some test failures under test/unit/math/fwd and test/unit/math/prim/scal/fun pertaining to promotion. Such as

./stan/math/prim/scal/fun/log_modified_bessel_first_kind.hpp:241:22: error: call of overloaded ‘log1p(const int&)’ is ambiguous
         lgam += log1p(v);
                 ~~~~~^~~
In file included from ./stan/math/prim/scal/fun/log1m.hpp:4:0,
                 from ./stan/math/prim/scal/fun/binary_log_loss.hpp:4,
                 from ./stan/math/prim/scal.hpp:67,
                 from test/unit/math/prim/scal/prob/von_mises_log_test.cpp:1:
./stan/math/prim/scal/fun/log1p.hpp:27:19: note: candidate: double stan::math::log1p(double)
     inline double log1p(double x) {
                   ^~~~~
./stan/math/prim/scal/fun/log1p.hpp:40:19: note: candidate: double stan::math::log1p(int64_t)
     inline double log1p(int64_t x) {
                   ^~~~~

So, it looks like if you call a function with an int it does not know whether to promote it to a double or an int64_t. This can be resolved by deleting the function signature for inline double log1p(int64_t x) (and a bunch of similar specializations of functions that are defined in cmath), but as I recall, we recently put those in to avoid ambiguous behavior.

alashworth commented 5 years ago

Comment by bob-carpenter Wednesday Nov 01, 2017 at 19:48 GMT

Yes, those were put in to avoid ambiguity with promoting to double or promoting to an autodiff type.

We could probably also put in definitions that restricted to arithmetic types using an enable_if, but then I don't think that'll help resolution ambiguity.

The only solution I see is to also have signatures for int64_t. It seems ridiculous, but I don't see a way around it so that everything will resolve.

On Nov 1, 2017, at 1:57 PM, bgoodri notifications@github.com wrote:

This is almost working now. Specifically, make test-headers works. There are some test failures under `test/unit/math/fwdandtest/unit/math/prim/scal/fun pertaining to promotion. Such as

./stan/math/prim/scal/fun/log_modified_bessel_first_kind.hpp:241:22: error: call of overloaded ‘log1p(const int&)’ is ambiguous lgam += log1p(v);


In file included from ./stan/math/prim/scal/fun/log1m.hpp:4:0,
                 from ./stan/math/prim/scal/fun/binary_log_loss.hpp:4,
                 from ./stan/math/prim/scal.hpp:67,
                 from test/unit/math/prim/scal/prob/von_mises_log_test.cpp:1:
./stan/math/prim/scal/fun/log1p.hpp:27:19: note: candidate: double stan::math::log1p(double)
     inline double log1p(double x) {
                   ^~~~~
./stan/math/prim/scal/fun/log1p.hpp:40:19: note: candidate: double stan::math::log1p(int64_t)
     inline double log1p(int64_t x) {
                   ^~~~~

So, it looks like if you call a function with an int it does not know whether to promote it to a double or an int64_t. This can be resolved by deleting the function signature for inline double log1p(int64_t x) (and a bunch of similar specializations of functions that are defined in cmath), but as I recall, we recently put those in to avoid ambiguous behavior.

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub, or mute the thread.

alashworth commented 5 years ago

Comment by bgoodri Wednesday Nov 01, 2017 at 19:52 GMT

You mean signatures for both int and int64_t inputs?

On Wed, Nov 1, 2017 at 3:49 PM, Bob Carpenter notifications@github.com wrote:

Yes, those were put in to avoid ambiguity with promoting to double or promoting to an autodiff type.

We could probably also put in definitions that restricted to arithmetic types using an enable_if, but then I don't think that'll help resolution ambiguity.

The only solution I see is to also have signatures for int64_t. It seems ridiculous, but I don't see a way around it so that everything will resolve.
On Nov 1, 2017, at 1:57 PM, bgoodri notifications@github.com wrote:

This is almost working now. Specifically, make test-headers works. There are some test failures under `test/unit/math/fwdandtest/unit/math/prim/scal/fun pertaining to promotion. Such as

./stan/math/prim/scal/fun/log_modified_bessel_first_kind.hpp:241:22: error: call of overloaded ‘log1p(const int&)’ is ambiguous lgam += log1p(v);
In file included from ./stan/math/prim/scal/fun/log1m.hpp:4:0,
from ./stan/math/prim/scal/fun/binary_log_loss.hpp:4,
from ./stan/math/prim/scal.hpp:67,
from test/unit/math/prim/scal/prob/von_mises_log_test.cpp:1:
./stan/math/prim/scal/fun/log1p.hpp:27:19: note: candidate: double
stan::math::log1p(double)
inline double log1p(double x) {
^~~~~
./stan/math/prim/scal/fun/log1p.hpp:40:19: note: candidate: double
stan::math::log1p(int64_t)
inline double log1p(int64_t x) {
^~~~~

So, it looks like if you call a function with an int it does not know
whether to promote it to a double or an int64_t. This can be resolved by
deleting the function signature for inline double log1p(int64_t x) (and a
bunch of similar specializations of functions that are defined in cmath),
but as I recall, we recently put those in to avoid ambiguous behavior.

—
You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2381#issuecomment-341219618, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqn8miqeD0UjAly7AR2svPNe365ijks5syMstgaJpZM4O2Ty4 .

alashworth commented 5 years ago

Comment by bob-carpenter Wednesday Nov 01, 2017 at 20:16 GMT

Yes. That'll remove the ambiguity because there's a most specific version that will match. I'm not sure if doing that with an enable_if on arithmetic types would work, but it might.

The problem with ambiguity is not just with var (to which the arithmetic types may be promoted), but also with built-ins when they're brought into scope. I think we've limited the latter, but I'm not 100% sure.

On Nov 1, 2017, at 3:52 PM, bgoodri notifications@github.com wrote:

You mean signatures for both int and int64_t inputs?

alashworth commented 5 years ago

Comment by bgoodri Wednesday Nov 01, 2017 at 20:27 GMT

Would it suffice to roll back the

inline double foo(int64_t x)

signatures to

inline double foo(int x)

? If x is an int64_t, then the compiler should know to promote that to a double because it can't demote that to an int.

On Wed, Nov 1, 2017 at 4:17 PM, Bob Carpenter notifications@github.com wrote:

Yes. That'll remove the ambiguity because there's a most specific version that will match. I'm not sure if doing that with an enable_if on arithmetic types would work, but it might.

The problem with ambiguity is not just with var (to which the arithmetic types may be promoted), but also with built-ins when they're brought into scope. I think we've limited the latter, but I'm not 100% sure.

On Nov 1, 2017, at 3:52 PM, bgoodri notifications@github.com wrote:

You mean signatures for both int and int64_t inputs?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2381#issuecomment-341228522, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqgRWVmuNMuW6bmrLN4LpfN7SUCTLks5syNG8gaJpZM4O2Ty4 .

alashworth commented 5 years ago

Comment by bgoodri Wednesday Nov 01, 2017 at 22:08 GMT

Also, there are issues converting an int to a fvar<>

In file included from ./stan/math/fwd/scal.hpp:58:0,
                 from ./stan/math/mix/scal.hpp:19,
                 from test/unit/math/mix/scal/fun/lmgamma_test.cpp:1:
./stan/math/fwd/scal/fun/lmgamma.hpp: In instantiation of ‘stan::math::fvar<typename stan::return_type<T, long int>::type> stan::math::lmgamma(int64_t, const stan::math::fvar<T>&) [with T = stan::math::fvar<stan::math::var>; typename stan::return_type<T, long int>::type = stan::math::fvar<stan::math::var>; int64_t = long int]’:
test/unit/math/mix/scal/fun/lmgamma_test.cpp:46:35:   required from here
./stan/math/fwd/scal/fun/lmgamma.hpp:19:9: error: conversion from ‘int’ to non-scalar type ‘stan::math::fvar<stan::math::var>’ requested
       T deriv = 0;
         ^~~~~

I think this has to do with the constructor not knowing how to promote:

      template <typename V>
      fvar(const V& v,
           typename boost::enable_if_c<ad_promotable<V, T>::value>::type*
           dummy = 0)
        : val_(v), d_(0.0) {
        if (unlikely(is_nan(v)))
          d_ = v;
      }

alashworth commented 5 years ago

Comment by bgoodri Wednesday Nov 01, 2017 at 22:33 GMT

I think I fixed the ad_promotable thing.

alashworth commented 5 years ago

Comment by bob-carpenter Thursday Nov 02, 2017 at 20:09 GMT

If `x` is an `int64_t`, then the compiler should know to promote that to a double because it can't demote that to an int.

With C++, you can go both ways:

  int64_t a = 10;
  int b = 5;
  a = b;
  b = a;

alashworth commented 5 years ago

Comment by bob-carpenter Thursday Nov 02, 2017 at 20:10 GMT

I think I fixed the ad_promotable thing.

That'd be great. We need to do better doc and edge cases on all of these---@seantalts and @bbbales2 have been cleaning some of them up.

alashworth / test-issue-import