Closed PabRod closed 2 months ago
Dear @mdingemanse, after wrestling with the ifadv
problem for a while, I found the following solution:
.csv
is, oddly enough, significantly easier than from R's native format .rda
. This is particularly true when your data is located on a URL (if you feel curiosity, compare the (buggy) rda version with the csv one). Long story short, loading a .rda
was forcing me to pre-download it, and that shouldn't be necessary..csv
copy of the dataset to elpaco-escience/ifadv. It is redundant, I know, so I kind of hid it in its own branch.The tests and vignettes keep running. The snapshots didn't change. But I consider this a potentially critical change in the dataset at its very origin, so it will be great if you can check the results still make sense.
Last and perhaps least, a reflection:
The original IFADV dataset was published under a permissive license. Without getting into the details, this indicates a spirit of open science and data sharing. It is ironic that this triggered an incompatibility problem with our also permissive license and ultimately forced us to apply a suboptimal solution.
And a coda: Probably all our code, plus all our public conversations, are already used to train IA models anyway.
Yes I quite agree that it would be lovely to redistribute; I remember us spending too much time early 2023 discussing the licensing constraints, and I think the IFADV makers would be scratching their heads about it. I think the current solution works around it in the most CRAN-ready way, which is our proximate goal here so approving
I went ahead and merged, let me know if you prefer me to leave this to you as PR author next time
ifadv
is currently hosted only in GitHub. Due to license incompatibility, we cannot just add this dataset to our package. Moreover, CRAN doesn't allow dependencies that live outside of CRAN, so we have no choice but to access it in a less elegant way.