add tests for multi-platform end-of-line behavior

Issue by betanalpha Wednesday Apr 01, 2015 at 21:58 GMT Originally opened as https://github.com/stan-dev/stan/issues/1416

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122:      beta_theta ~ normal(0,2);
123:      
124:      alpha_mu ~ normal(0,2);
                                                           ^
125:      alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2);
122: 
123: alpha_mu ~ normal(0,2);
124: alpha_sigma ~ normal(0,2);
125: alpha_theta ~ normal(0,2);
126:
127: // Measurements
128: for(n in 1:N_ooc) {
129:   ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C));
130:   ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T));
131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.

Comment by betanalpha Wednesday Apr 01, 2015 at 22:01 GMT

Possibly comments aren't getting counted in terms of line numbers?

Comment by bob-carpenter Wednesday Apr 01, 2015 at 23:53 GMT

I haven't and just tried a few examples and it seems fine for line numbers.

Can you give me the whole program and I can try it here?

Whatever's causing this issue would also cause runtime numbers to be off, since they're using the same counting.

Are you running this Windows or Linux or Mac? They all have different line ends, and that may also be messing things up somehow.

On Apr 2, 2015, at 8:58 AM, Michael Betancourt notifications@github.com wrote:

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122: beta_theta ~ normal(0,2); 123:
124: alpha_mu ~ normal(0,2); ^ 125: alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2); 122: 123: alpha_mu ~ normal(0,2); 124: alpha_sigma ~ normal(0,2); 125: alpha_theta ~ normal(0,2); 126: 127: // Measurements 128: for(n in 1:N_ooc) { 129: ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C)); 130: ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T)); 131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.

— Reply to this email directly or view it on GitHub.

Comment by betanalpha Thursday Apr 02, 2015 at 00:03 GMT

I’m on a Mac — runtime errors are definitely off, too. The model/data aren’t public so I’ll try to whip up a pathological model tomorrow.

On Apr 2, 2015, at 12:53 AM, Bob Carpenter notifications@github.com wrote:

I haven't and just tried a few examples and it seems fine for line numbers.

Can you give me the whole program and I can try it here?

Whatever's causing this issue would also cause runtime numbers to be off, since they're using the same counting.

Are you running this Windows or Linux or Mac? They all have different line ends, and that may also be messing things up somehow.

Bob

On Apr 2, 2015, at 8:58 AM, Michael Betancourt notifications@github.com wrote:

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122: beta_theta ~ normal(0,2); 123: 124: alpha_mu ~ normal(0,2); ^ 125: alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2); 122: 123: alpha_mu ~ normal(0,2); 124: alpha_sigma ~ normal(0,2); 125: alpha_theta ~ normal(0,2); 126: 127: // Measurements 128: for(n in 1:N_ooc) { 129: ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C)); 130: ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T)); 131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

Comment by betanalpha Thursday Apr 02, 2015 at 09:56 GMT

Turns out it’s Windows newlines — the model was originally written on a Windows box. Is there any way for the parser to aggregate newline types and warn the user of potential biases in the reported line numbers?

On Apr 2, 2015, at 1:03 AM, Michael Betancourt betanalpha@gmail.com wrote:

I’m on a Mac — runtime errors are definitely off, too. The model/data aren’t public so I’ll try to whip up a pathological model tomorrow.

On Apr 2, 2015, at 12:53 AM, Bob Carpenter notifications@github.com wrote:

I haven't and just tried a few examples and it seems fine for line numbers.

Can you give me the whole program and I can try it here?

Whatever's causing this issue would also cause runtime numbers to be off, since they're using the same counting.

Are you running this Windows or Linux or Mac? They all have different line ends, and that may also be messing things up somehow.

Bob

On Apr 2, 2015, at 8:58 AM, Michael Betancourt notifications@github.com wrote:

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122: beta_theta ~ normal(0,2); 123: 124: alpha_mu ~ normal(0,2); ^ 125: alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2); 122: 123: alpha_mu ~ normal(0,2); 124: alpha_sigma ~ normal(0,2); 125: alpha_theta ~ normal(0,2); 126: 127: // Measurements 128: for(n in 1:N_ooc) { 129: ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C)); 130: ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T)); 131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

Comment by bob-carpenter Thursday Apr 02, 2015 at 12:35 GMT

What do you want the behavior to be?

The line counting is happening below the level of the parser in the iterator that Spirit uses. And it'd be hard to change there. But the good news is we have the whole string representing the program before it gets used to create the iterators.

Here's the iterator getting created in lang/parser.hpp:

  std::ostringstream buf;
  buf << input.rdbuf();
  std::string stan_string = buf.str();

  // ***** could add warnings or preprocessing here *****

  typedef std::string::const_iterator input_iterator;
  typedef boost::spirit::line_pos_iterator<input_iterator> lp_iterator;

  lp_iterator fwd_begin = lp_iterator(stan_string.begin());
  lp_iterator fwd_end = lp_iterator(stan_string.end());

It'd be easy to drop code in where I wrote the comment above.

On Apr 2, 2015, at 8:56 PM, Michael Betancourt notifications@github.com wrote:

Turns out it’s Windows newlines — the model was originally written on a Windows box. Is there any way for the parser to aggregate newline types and warn the user of potential biases in the reported line numbers?

On Apr 2, 2015, at 1:03 AM, Michael Betancourt betanalpha@gmail.com wrote:

I’m on a Mac — runtime errors are definitely off, too. The model/data aren’t public so I’ll try to whip up a pathological model tomorrow.

On Apr 2, 2015, at 12:53 AM, Bob Carpenter notifications@github.com wrote:

I haven't and just tried a few examples and it seems fine for line numbers.

Can you give me the whole program and I can try it here?

Whatever's causing this issue would also cause runtime numbers to be off, since they're using the same counting.

Are you running this Windows or Linux or Mac? They all have different line ends, and that may also be messing things up somehow.

Bob

On Apr 2, 2015, at 8:58 AM, Michael Betancourt notifications@github.com wrote:

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122: beta_theta ~ normal(0,2); 123: 124: alpha_mu ~ normal(0,2); ^ 125: alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2); 122: 123: alpha_mu ~ normal(0,2); 124: alpha_sigma ~ normal(0,2); 125: alpha_theta ~ normal(0,2); 126: 127: // Measurements 128: for(n in 1:N_ooc) { 129: ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C)); 130: ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T)); 131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

Comment by betanalpha Thursday Apr 02, 2015 at 13:01 GMT

So are the Windows newlines messing up Spirit directly? Or is it having a mix of Unix and Windows newlines?

In the former case having a comment like “I detected Windows newlines so be careful with the error message” would be huge given the relatively large number of Windows users that we have. In the latter having a “I found mixed newlines, be careful…” would be great.

On Apr 2, 2015, at 1:35 PM, Bob Carpenter notifications@github.com wrote:

What do you want the behavior to be?

The line counting is happening below the level of the parser in the iterator that Spirit uses. And it'd be hard to change there. But the good news is we have the whole string representing the program before it gets used to create the iterators.

Here's the iterator getting created in lang/parser.hpp:

std::ostringstream buf; buf << input.rdbuf(); std::string stan_string = buf.str();

// * could add warnings or preprocessing here *

typedef std::string::const_iterator input_iterator; typedef boost::spirit::line_pos_iterator lp_iterator;

lp_iterator fwd_begin = lp_iterator(stan_string.begin()); lp_iterator fwd_end = lp_iterator(stan_string.end());

It'd be easy to drop code in where I wrote the comment above.

Bob

On Apr 2, 2015, at 8:56 PM, Michael Betancourt notifications@github.com wrote:

Turns out it’s Windows newlines — the model was originally written on a Windows box. Is there any way for the parser to aggregate newline types and warn the user of potential biases in the reported line numbers?

On Apr 2, 2015, at 1:03 AM, Michael Betancourt betanalpha@gmail.com wrote:

I’m on a Mac — runtime errors are definitely off, too. The model/data aren’t public so I’ll try to whip up a pathological model tomorrow.

On Apr 2, 2015, at 12:53 AM, Bob Carpenter notifications@github.com wrote:

I haven't and just tried a few examples and it seems fine for line numbers.

Can you give me the whole program and I can try it here?

Whatever's causing this issue would also cause runtime numbers to be off, since they're using the same counting.

Are you running this Windows or Linux or Mac? They all have different line ends, and that may also be messing things up somehow.

Bob

On Apr 2, 2015, at 8:58 AM, Michael Betancourt notifications@github.com wrote:

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122: beta_theta ~ normal(0,2); 123: 124: alpha_mu ~ normal(0,2); ^ 125: alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2); 122: 123: alpha_mu ~ normal(0,2); 124: alpha_sigma ~ normal(0,2); 125: alpha_theta ~ normal(0,2); 126: 127: // Measurements 128: for(n in 1:N_ooc) { 129: ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C)); 130: ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T)); 131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

Comment by bob-carpenter Thursday Apr 02, 2015 at 13:31 GMT

I think the parser/iterator should treat end-of-line markers from any OS as end-of-line markers. That ensures we get the same behavior from the same byte stream cross platform.

I'd guess that's what's happening now, but we need to add tests as part of resolving this issue and to figure out what's going on.

[ ] add tests for behavior with mixed end-of-line markers

Then we can follow your second suggestion:

[ ] provide a warning if there are mixed end-of-line markers

I can take on doing both of these.

Comment by VMatthijs Thursday Dec 13, 2018 at 13:22 GMT

This is still an issue in stanc2, from my experience.

It's been fixed in stanc3, however.

alashworth / test-issue-import

add tests for multi-platform end-of-line behavior #66