HISKP-LQCD / hadron

R package implementing analysis tools for lattice QCD
16 stars 13 forks source link

external data package #292

Open kostrzewa opened 3 years ago

kostrzewa commented 3 years ago

In order to include some more example data in hadron, especially loops and/or gradient flow files, it will be necessary to externalise this due to the CRAN size limitations (https://thecoatlessprofessor.com/programming/r/size-and-limitations-of-packages-on-cran/)

We've already started something along these lines in https://github.com/HISKP-LQCD/hadron_example_data, but we clearly need a mechanism in hadron to load this data from github (or elsewhere).

My example data set for the gradient flow observables, for example, is 3.2 MB by itself...

kostrzewa commented 3 years ago

https://thecoatlessprofessor.com/programming/r/creating-an-r-data-package/

kostrzewa commented 3 years ago

https://thecoatlessprofessor.com/programming/r/r-data-packages-in-external-data-repositories-using-the-additional_repositories-field/

kostrzewa commented 3 years ago

anyone feeling inspired to do this?

martin-ueding commented 3 years ago

I'll take a look into it.

urbach commented 3 years ago

anyone feeling inspired to do this?

an alternative could be to have the data on a web server, which can also directly be accessed from R. This might be much less work...?

martin-ueding commented 3 years ago

As it is set up now, it is not much work to incrementally add data.

How would we deal with lots of small files in the webserver approach? Would be package them as ZIP files beforehand? The advantage would be that the users could just download the data they want, not having to install all sample data.

In this case one could just make a GitHub repository and directly publish the master git GitHub Pages. I am not sure about automatic directory listings, perhaps we would need to create a table of contents ourselves, which is easy to automate.