Closed lukashergt closed 3 months ago
It should store only every Nth sample point. Depending on N (=current_point.output_thin) this may not make files smaller, e.g. when N is smaller than the mean number of samples at each point, e.g. the oversampling_factors are small. (in your example, you probably don't have a significant parameter speed hierarchy)
Hmm, ok, I'm clearly not quite getting how the oversampling is implemented...
Asked differently, for oversample_thin: true
I would have expected all x1, x2
value pairs in the above example to be unique; however, here are the first few lines of the corresponding output file:
(py3121env) [~/cobayatest_oversample_thin]$ head -n 15 cobayatest_oversample_thin_true.1.txt
# weight minuslogpost x1 x2 x3 minuslogprior minuslogprior__0 chi2 chi2__gauss3d
1 30.07359 5.3152506 -2.2039542 2.5719266 7.4547199 7.4547199 45.237741 45.237741
2 26.993948 5.3152506 -2.2039542 0.67492268 7.4547199 7.4547199 39.078455 39.078455
1 27.309466 5.4104726 -2.1135579 0.67492268 7.4547199 7.4547199 39.709493 39.709493
1 26.51419 5.5510187 -1.1558462 0.67492268 7.4547199 7.4547199 38.118941 38.118941
1 27.118065 5.5510187 -1.1558462 1.2896787 7.4547199 7.4547199 39.326691 39.326691
1 19.546187 4.0174204 -0.93078769 1.2896787 7.4547199 7.4547199 24.182934 24.182934
1 20.723015 4.0174204 -0.93078769 2.004227 7.4547199 7.4547199 26.536589 26.536589
1 21.337675 4.0174204 -0.93078769 2.2904687 7.4547199 7.4547199 27.76591 27.76591
1 19.649555 4.0174204 -0.93078769 -1.3674819 7.4547199 7.4547199 24.38967 24.38967
3 14.08358 1.6888194 -1.7383815 -1.3674819 7.4547199 7.4547199 13.257719 13.257719
1 14.143138 1.6888194 -1.7383815 -1.4103633 7.4547199 7.4547199 13.376837 13.376837
1 13.775281 1.6888194 -1.7383815 -1.1195576 7.4547199 7.4547199 12.641122 12.641122
5 11.93854 1.3840606 0.53383186 -1.1195576 7.4547199 7.4547199 8.9676407 8.9676407
2 12.378051 1.7479903 -0.15540818 -1.1195576 7.4547199 7.4547199 9.8466623 9.8466623
In many lines x1
and x2
stay at the same value. For oversample_thin: false
this is expected, but for oversample_thin: true
I would have expected these repetitions to be thrown away. Am I misunderstanding how the thinning works?
(in your example, you probably don't have a significant parameter speed hierarchy)
Well, I did manually block it 2 to 1, for which I get the same output for false
and true
.
I tried again, this time changing the blocks to 5 to 1, and this did lead to longer outputs for false
compared to true
(about 9700 lines and 16000 summed weights for false
, and 6700 lines and 7900 summed weights for true
), still far from a 5:1 ratio, though, and still with repeating x1
and x2
values for true
.
You need to thin by more than the mean weight to get a significant reduction in the number of rows.
If the unthinned data has weights 2, 4, 2,2,2.. then thinning by 2 would give the output you quote. Imagine expanding unthinned data to many more rows where all rows have weight 1 (and some are duplicates), then take every other row.
Imagine expanding unthinned data to many more rows where all rows have weight 1 (and some are duplicates), then take every other row.
Ok, that is actually exactly what I was imagining... Which means I don't understand what is going on in the original example. The output is the exact same for oversample_thin: true
and oversample_thin: false
. Do you get that, too?
Are you saying the thinning factor is larger than 1 but the mean weights of the output are actually identical? If you thin by 2, the output weights are only unity if the original weight was 1, 2 or 3 (depending on thinning phase). Thinning works on all lines, not just looking the subset of slowest parameters.
Closing, please reopen if you have a reproducible example that is clearly a bug
How exactly is the
oversample_thin
parameter supposed to work?The reason that I am asking is that in the following reduced example I am getting almost exactly the same output (note that I am setting
seed: 0
) regardless of whether I am choosingoversample_thin: true
oroversample_thin: false
. I would have expectedoversample_thin: true
to lead to reduced output files (as it turns out,true
ends up producing larger output files because it continues sampling a bit longer)...