benchmarking and reproducibility

parmentelat commented 5 months ago

the paper comes with a very nice figure comparing the performance of various setups and algorithms also the sources come with several files whose name suggest they are useful to reproduce these results

./time_pls.py
./plot_timings.py
as well as the whole ./timings folder

my question is, are there any plans to document how to use this material if one wanted to reproduce the results in the paper ? (or is that documented already and I missed it ?)

parmentelat commented 5 months ago

and oops I forgot to link the meta-issue https://github.com/openjournals/joss-reviews/issues/6533

Sm00thix commented 5 months ago

Hi @parmentelat. Thank you for your feedback 😊

The files were constructed ad hoc to generate the experimental results shown in the figure in the paper.

The figure is supposed to graphically give a set of general guidelines for choosing an algorithm based on a users specific dataset size and desired maximum number of PLS components. While a user can certainly use the files to execute the same experiments and generate a similar figure, a reproduction of the results is unlikely due to differences in hardware and other factors impacting the execution time.

Additionally, the files generate random data to fit the models on, so they do not serve any practical purpose for users. For this reason, the files are not part of the ikpls package and are not documented.

If, after considering these points, you believe that the files have some value and should be documented, then I will document them. However, I do not think they should be part of the ikpls package and therefore also not the ikpls package documentation.

parmentelat commented 5 months ago

Hi @Sm00thix

fair enough: I was not suggesting for these files to be described in the official doc !

I guess my point is twofold

First, on a purely cosmetic level, I was more implying that:

their position in the repo toplevel kind of suggests they are more crucial than they really are
so they probably better fit some deeper location in the file tree as I understand it, these 2 files, together with the timings/ package, are relevant only for the sake of producing the paper, so maybe they more belong under the paper/ folder

Second, and much more importantly, I am saying that these are rather valuable artefacts to the paper itself; because when you say

While a user can certainly use the files to execute the same experiments and generate a similar figure, a reproduction of the results is unlikely due to differences in hardware and other factors impacting the execution time.

I reply that it is the whole point of reproducibility !

so imho you have a choice:

either you claim for a reproducible paper, where the experiment can be run again in other contexts for comparison purposes, in which case you keep the files in the repo - although in a lower location maybe, as hinted above - and give an informal readme on how to put that together to produce the png
or you don't, in which case you can drop them off the repo altogether imho; it'd be a pity imho, but it's your choice

again I find it misleading that they sit as first-class citizens at the top of the repo like they do now

Sm00thix commented 5 months ago

Hi @parmentelat

Thank you for the elaboration. I appreciate your points. I will move the timing files to the paper directory and write an informal readme on how to use them to produce the png.

parmentelat commented 5 months ago

Hi @Sm00thix

thanks for having taken care of this while trying to run my own benchmarks, I just ran into a minor issue: after I completed my own first run, because timings.csv in the repo has no newline at end of file, my own results ended up on that same line, which breaks it

stepping back a little bit, it seems to me that it'd be better if time_pls.py was to store its result in another file (not the one provided in the repo), so that one could more freely delete/create new run results, and move the csv into place once one is satisfied with the contents

again this is quite minor, just writing it down on my way to trying and reproduce the paper results..

Sm00thix commented 5 months ago

Hi @parmentelat

Thanks for pointing that out. The missing newline is my fault. The last line in timings.csv was entered manually (inferred once time/iteration had stabilized) and I forgot to put in the newline. I will add it.

I agree with your points. Therefore, I will modify time_pls.py to write its results to another file and have plot_timings.py read this file instead of timings.csv. Is this in line with your suggestion?

I will make sure to update the paper/README.md as well.

parmentelat commented 5 months ago

hiya @Sm00thix yes that is fine, thanks

Sm00thix commented 5 months ago

Hi @parmentelat Thanks! I have updated the files in the dev branch.

I also updated paper/README.md to reflect the changes and added a short paragraph to address what will happen if a user wants to benchmark a cross-validation that is not a leave-one-out cross-validation. This is entirely possible to do, but the .png generated by plot_timings.py will then incorrectly interpret any cross-validation as a leave-one-out cross-validation in the plot that it generates.

Sm00thix commented 5 months ago

The changes have been merged to the main branch.

Sm00thix / IKPLS

benchmarking and reproducibility #13