gnu-octave / statistics

The Statistics package for GNU Octave
GNU General Public License v3.0
24 stars 22 forks source link

Duplicated PNG files in 1.6.4 tarball #138

Closed NRJank closed 5 months ago

NRJank commented 5 months ago

as per https://savannah.gnu.org/bugs/?65419

In the release tarball for statistics 1.6.4, there are some duplicated PNG files in docs/assets:

$ md5sum anovan_1201.png fitlm_101.png
a0cc360c46f667ec5ca711e3545c37f1  anovan_1201.png
a0cc360c46f667ec5ca711e3545c37f1  fitlm_101.png
$ md5sum bvncdf_101.png mvncdf_101.png
510e8e8a5631b917428d5ec65d0eca31  bvncdf_101.png
510e8e8a5631b917428d5ec65d0eca31  mvncdf_101.png
$ md5sum mhsample_201.png slicesample_201.png
f2f1954ea7c83a86717489cc6874ef71  mhsample_201.png
f2f1954ea7c83a86717489cc6874ef71  slicesample_201.png
$ md5sum vartestn_201.png vartestn_301.png
3cd7cb3e00c8e2e7e5e773d71b4308fb  vartestn_201.png
3cd7cb3e00c8e2e7e5e773d71b4308fb  vartestn_301.png

However, they should not be, because they are, in principle, generated from different %demo blocks in the corresponding *.m files.

pr0m1th3as commented 5 months ago

But they are identical images generated from the same code. This is something related to pkg-octave-doc package rather than the statistics. Automatically removing these duplicates involves some quite tricky and elaborate code and it doesn't worth the hassle. It is also easier this way to keep track of the images and the functions' code blocks they have been generated from. I think it is best to leave this as is and not consider it a bug or an issue.

rlaboiss commented 5 months ago

Fair enough. However, the images are not generated from the same code, but they happen to be identical because they are produced from the same data. For instance, anovan_1201.png and fitlm_101.png are generated from these demos (respectively in inst/anovan.m and inst/filtm.m):

%!demo
%! y =  [ 8.706 10.362 11.552  6.941 10.983 10.092  6.421 14.943 15.931 ...
%!        22.968 18.590 16.567 15.944 21.637 14.492 17.965 18.851 22.891 ...
%!        22.028 16.884 17.252 18.325 25.435 19.141 21.238 22.196 18.038 ...
%!        22.628 31.163 26.053 24.419 32.145 28.966 30.207 29.142 33.212 ...
%!        25.694 ]';
%! X = [1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5]';
%!
%! [TAB,STATS] = fitlm (X,y,"linear","CategoricalVars",1,"display","on");
%!demo
%!
%! # Unbalanced one-way ANOVA with custom, orthogonal contrasts. The statistics
%! # relating to the contrasts are shown in the table of model parameters, and
%! # can be retrieved from the STATS.coeffs output.
%!
%! dv =  [ 8.706 10.362 11.552  6.941 10.983 10.092  6.421 14.943 15.931 ...
%!        22.968 18.590 16.567 15.944 21.637 14.492 17.965 18.851 22.891 ...
%!        22.028 16.884 17.252 18.325 25.435 19.141 21.238 22.196 18.038 ...
%!        22.628 31.163 26.053 24.419 32.145 28.966 30.207 29.142 33.212 ...
%!        25.694 ]';
%! g = [1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 ...
%!      4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5]';
%! C = [ 0.4001601  0.3333333  0.5  0.0
%!       0.4001601  0.3333333 -0.5  0.0
%!       0.4001601 -0.6666667  0.0  0.0
%!      -0.6002401  0.0000000  0.0  0.5
%!      -0.6002401  0.0000000  0.0 -0.5];
%!
%! [P,ATAB, STATS] = anovan (dv, g, "contrasts", C, "varnames", "score", ...
%!                          "alpha", 0.05, "display", "on");

However, I agree that this is a non-issue. When I filed the bug report on Savannah, I did not look closely into the problem and thought that something was wrong with the production of the figures for the documentation, since it seemed to be an extraordinary coincidence having those identical files.

I think it is fine to leave this as is, although a simple “fix” would be to add different titles to the plots in the different demo blocks.

pr0m1th3as commented 5 months ago

With "generated from the same code" I meant that the underlying code inside fitlm and anovan functions is the same. To change the figures requires that we feed the functions with different data, but the point of these demos is to showcase the equivalence between the two functions, which can be used interchangeably to compute the same statistics.