ambiguity of `1./a` (real literal and infixOp)

alashworth / test-issue-import

0 stars 0 forks source link

ambiguity of `1./a` (real literal and infixOp) #169

Open alashworth opened 5 years ago

alashworth commented 5 years ago

Issue by jan-glx Monday Oct 23, 2017 at 15:55 GMT Originally opened as https://github.com/stan-dev/stan/issues/2425

Summary:

1./a is parsed as 1.0/a instead of 1 ./ a (where are a is a vector). This confused me a bit.

To avoid this ambiguity I would prefer to not allow 1. but enforce 1.0 as I would consider the latter more readable anyways. If you care to much about breaking existing code, please consider to have the parser issue a warning in this case.

Related: In the grammar in the manual the definitions of numeric_literal and real_literal are equivalent: (every numeric is real)

numeric_literal ::= integer_literal | real_literal
integer_literal ::= 0 | [1-9] [0-9] *
real_literal ::= integer_literal ?('.' [0-9] * ) ?exp_literal

I would prefer: real_literal ::= integer_literal '.' [0-9] + ?exp_literal

Current Version:

rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)

alashworth commented 5 years ago

Comment by bgoodri Monday Oct 23, 2017 at 16:02 GMT

We should fix this, but you should be using inv(a).

On Mon, Oct 23, 2017 at 11:55 AM, Jan Gleixner notifications@github.com wrote:

Summary:

1./a is parsed as 1.0/a instead of 1 ./ a (where are a is a vector). This confused me a bit.

To avoid this ambiguity I would prefer to not allow 1. but enforce 1.0 as I would consider the latter more readable anyways. If you care to much about breaking existing code, please consider to have the parser issue a warning in this case.

Related: In the grammar in the manual the definitions of numeric_literal and real_literal are equivalent: (every numeric is real)

numeric_literal ::= integer_literal | real_literal integer_literal ::= 0 | [1-9] [0-9] real_literal ::= integer_literal ?('.' [0-9] ) ?exp_literal

I would prefer: real_literal ::= integer_literal '.' [0-9] + ?exp_literal Current Version:

rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stanc3/issues/1403, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqjY2pylaPqi4MYazrNOk2Am-yFGeks5svLb4gaJpZM4QDFGu .

alashworth commented 5 years ago

Comment by bob-carpenter Tuesday Oct 24, 2017 at 13:37 GMT

Thanks for pointing out the issue.

Making the suggested change would break backward compatibility, so it'd have to wait until 3.0 at the earliest. There are quite a few users who write 1. out of habit (presumably from other languages as you don't see this in any of our code).

Finally, I'd rather follow the C++ and R conventions for numbers rather than change behavior for an edge case of elementwise division.

I also don't see this as much of a problem, becuase 1./x won't be well formed if x is a matrix type. Is there another context where you see this being problematic?

Also, it's not a problem if you use our preferred syntax, 1. / x rather than 1./x.

P.S. Thanks for the patch on the BNF---that I'll fix right away. See: https://github.com/stan-dev/stan/issues/2336#issuecomment-338992445

We couldn't possibly do that until Stan 3.0 because it would break backward compatibility.

alashworth commented 5 years ago

Comment by bob-carpenter Tuesday Oct 24, 2017 at 13:37 GMT

Unless there's a reason I'm missing for making this change, I'll close this issue on the next cleanup pass.

alashworth commented 5 years ago

Comment by jan-glx Tuesday Oct 24, 2017 at 15:06 GMT

Well, neither C++ nor R have this ambiguity as they don't have dotted operators like ./.

To answer your question: Yes, in the following case, a single makes a big difference that is hard to notice as both cases are valid and return a matrix of the expected size:

library(rstan)
mod <-  stan_model(model_code="
data {
matrix[2,2] a;
matrix[2,2] b;
}
generated quantities{
matrix[2,2] c= a*2.*b;
matrix[2,2] d= a*2 .*b;
}
")
m = matrix(1:4, ncol=2)
fit <- sampling(mod, data= list(a=m, b=m), iter=1, chains=1, algorithm="Fixed_param")

resulting in:

> lapply(extract(fit), drop)
$c
     [,1] [,2]
[1,]   14   30
[2,]   20   44

$d
     [,1] [,2]
[1,]    2   18
[2,]    8   32

$lp__
[1] 0

> (m * 2) %*% m
     [,1] [,2]
[1,]   14   30
[2,]   20   44
> m * 2 * m
     [,1] [,2]
[1,]    2   18
[2,]    8   32

Of course, using proper spacing or putting the constant to the beginning like everybody does would circumvent the problem.

Still, I think you want to avoid the dependence on white space in the long run (and the earlier the less code you would break) - just think of how much trouble R's if(a<-b) vs if(a< -b) has caused users.

Please consider to just issue a warning for now if code matches both: integer_literal '.' infixOp and integer_literal infixOp.

Thanks for appreciating my grumping and thanks for stan :)

alashworth commented 5 years ago

Comment by bob-carpenter Tuesday Oct 24, 2017 at 15:21 GMT

Thanks. Concrete examples help a lot.

As the parser's currently written, it's not technically an ambiguity insofar as the parser is greedy left-to-right, so the "." goes with the "1" if you write "1." Otherwise, if you just read the BNF, I think a * b * c would be similarly ambiguous.

I agree that your example is hard to read. And as you note, spacing really helps. No surprise that I suggest following our code formatting advice and including spaces around operators. It makes them a lot easier to read. (So does indentation, by the way.) Compare your

generated quantities{
matrix[2,2] c= a*2.*b;
matrix[2,2] d= a*2 .*b;
}

with our recommended

generated quantities{
  matrix[2, 2] c = a * 2. * b;
  matrix[2, 2] d = a * 2 .* b;
}

Not perfect, but certainly easier.

The dependence on whitespace is a very good point. Let me ponder this. The other place this shows up is with declarations, where something like this

realb = 3;

which can either be read as assignment to variable realb (how it's parsed after we fixed an earlier bug) or the compound declaration and definition of a new variable b.

real b = 3;

as our initial buggy parser treated it as.

alashworth commented 5 years ago

Comment by bob-carpenter Tuesday Oct 24, 2017 at 15:22 GMT

The warning is a very good idea! We'd start there by deprecating the feature anyway. So we should at least do that.

Thanks again for commenting. We really do appreciate people being this thoughtful about the language. We don't take it negatively at all.