Control precision of sampling ASCII output

alashworth commented 5 years ago

Issue by bob-carpenter Sunday Apr 22, 2018 at 20:06 GMT Originally opened as https://github.com/stan-dev/stan/issues/2515

From @aaronjg on April 20, 2018 20:19

Summary:

When writing out data to the sample file, precision is lost. The data should be the same when writing out and read back in using read_stan_csv as when using rstan.

Description:

When rstan writes to a file it only keeps the first 6 decimal places of precision, this causes the sample file to differ from what is stored in the rstan object.

Reproducible Steps:

stan.model <- compile_model("bernoulli.stan") source(bernoulli.data.R) out <- sampling(stan.model,chains=1,sample_file="out.csv",seed=1)

Current Output:

extract(foo,permute=FALSE,inc_warmup=TRUE)[1:10,,]


parameters
iterations      theta      lp__
[1,] 0.10831226 -7.699964
[2,] 0.10831226 -7.699964
[3,] 0.10831226 -7.699964
[4,] 0.10719339 -7.719830
[5,] 0.08838579 -8.110978
[6,] 0.23460585 -6.755824
[7,] 0.22917849 -6.762448
[8,] 0.17383842 -6.967571
[9,] 0.17383842 -6.967571
[10,] 0.19948957 -6.838531


head -n 35 out.csv| tail

lp,accept_stat,stepsize,treedepth__,n_leapfrog,divergent,energy,theta -7.69996,0.821754,1,2,3,0,7.95282,0.108312 -7.69996,1.87223e-146,10.4034,1,1,0,9.9209,0.108312 -7.69996,0.0719279,1.59718,1,1,0,7.70735,0.108312 -7.71983,0.999813,0.180632,1,1,0,7.72391,0.107193 -8.11098,0.997778,0.23924,3,7,0,8.11104,0.0883858 -6.75582,0.995139,0.366777,2,5,0,8.47479,0.234606 -6.76245,0.978753,0.609755,2,3,0,6.91575,0.229178 -6.96757,0.901618,1.01542,1,1,0,6.9752,0.173838 -6.96757,0.00030679,1.36694,1,1,0,7.87545,0.173838



#### Expected Output:
If applicable, the output you expected from RStan.

File written should have the same values as the extract command.

#### RStan Version:

Compiled from: 4706b82028a7fc3a31cbdf6c60beed4c49233562

#### R Version:

"R version 3.4.4 (2018-03-15)"
#### Operating System:
Your operating system (e.g., OS X 10.11.3)
Ubuntu 14.04

_Copied from original issue: stan-dev/rstan#518_

alashworth commented 5 years ago

Comment by bob-carpenter Sunday Apr 22, 2018 at 20:06 GMT

From @bgoodri on April 20, 2018 20:25

This is a Stan thing rather than a RStan one, and I believe it is intentional.

On Fri, Apr 20, 2018 at 4:19 PM, aaronjg notifications@github.com wrote:

Summary:

When writing out data to the sample file, precision is lost. The data should be the same when writing out and read back in using read_stan_csv as when using rstan. Description:

When rstan writes to a file it only keeps the first 6 decimal places of precision, this causes the sample file to differ from what is stored in the rstan object. Reproducible Steps:

stan.model <- compile_model("bernoulli.stan") source(bernoulli.data.R) out <- sampling(stan.model,chains=1,sample_file="out.csv",seed=1) Current Output:

extract(foo,permute=FALSE,inc_warmup=TRUE)[1:10,,]
      parameters
iterations theta lp__ [1,] 0.10831226 -7.699964 [2,] 0.10831226 -7.699964 [3,] 0.10831226 -7.699964 [4,] 0.10719339 -7.719830 [5,] 0.08838579 -8.110978 [6,] 0.23460585 -6.755824 [7,] 0.22917849 -6.762448 [8,] 0.17383842 -6.967571 [9,] 0.17383842 -6.967571 [10,] 0.19948957 -6.838531

head -n 35 out.csv| tail

lp,accept_stat,stepsize,treedepth__,n_leapfrog,divergent,energy,theta -7.69996,0.821754,1,2,3,0,7.95282,0.108312 -7.69996,1.87223e-146,10.4034,1,1,0,9.9209,0.108312 -7.69996,0.0719279,1.59718,1,1,0,7.70735,0.108312 -7.71983,0.999813,0.180632,1,1,0,7.72391,0.107193 -8.11098,0.997778,0.23924,3,7,0,8.11104,0.0883858 -6.75582,0.995139,0.366777,2,5,0,8.47479,0.234606 -6.76245,0.978753,0.609755,2,3,0,6.91575,0.229178 -6.96757,0.901618,1.01542,1,1,0,6.9752,0.173838 -6.96757,0.00030679,1.36694,1,1,0,7.87545,0.173838

Expected Output:

If applicable, the output you expected from RStan.

File written should have the same values as the extract command. RStan Version:

Compiled from: 4706b82 https://github.com/stan-dev/rstan/commit/4706b82028a7fc3a31cbdf6c60beed4c49233562 R Version:

"R version 3.4.4 (2018-03-15)" Operating System:

Your operating system (e.g., OS X 10.11.3) Ubuntu 14.04

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/518, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqiK4jPoJYxqxM7Ic1cVL2mAkXCXlks5tqkK4gaJpZM4TeCT5 .

alashworth commented 5 years ago

Comment by bob-carpenter Sunday Apr 22, 2018 at 20:06 GMT

We could double file size and clog up I/O for that extra precision, but most computations don't have much more than the residual precision we provide left over. So even though you get about 16 digits of precision in floating point, after sampling, it's usually not that accurate.

Ideally, we'd have a feature to control the precison.

alashworth commented 5 years ago

Comment by bob-carpenter Sunday Apr 22, 2018 at 20:06 GMT

I'm going to move this to being a Stan feature request. My guess is that we'll wind up providing a binary output format before fixing it, though you never know. It should be easy to extend precision, just a matter of how to control it in the calls.

alashworth commented 5 years ago

Comment by aaronjg Sunday Apr 22, 2018 at 20:27 GMT

I don't particularly expect the extra precision to add much to the inference. However, as I was moving from keeping the results in memory to streaming to a file and loading them back in, I was expecting identical results and had some tests fail because of it. Having a binary output format seems ideal.

alashworth commented 5 years ago

Comment by jgabry Sunday Apr 22, 2018 at 21:27 GMT

Yeah this would be nice but at least it should be deterministic currently, so a tolerance level for the tests will work reliably.

If it’s not documented anywhere we should do that too.

alashworth commented 5 years ago

Comment by bob-carpenter Monday Apr 23, 2018 at 14:50 GMT

The outputs aren't random now, just limited precision that's hard coded into the I/O (or taken by default---I don't even know which). I understand that this level of consistency with round trip I/O would be nice, but that's usually too much to ask for floating point.

We could get more precision---don't know if round trips are possible. The usual recommendation is to never try to compare floating point other than to within known precision.

On Apr 22, 2018, at 5:27 PM, Jonah Gabry notifications@github.com wrote:

Yeah this would be nice but at least it should be deterministic currently, so a tolerance level for the tests will work reliably.

If it’s not documented anywhere we should do that too.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

alashworth / test-issue-import