elpaco-escience / talkr

https://elpaco-escience.github.io/talkr/
Apache License 2.0
0 stars 0 forks source link

117 ensure fresh install works even if ifadv missing #121

Closed PabRod closed 2 months ago

PabRod commented 2 months ago

ifadv is currently hosted only in GitHub. Due to license incompatibility, we cannot just add this dataset to our package. Moreover, CRAN doesn't allow dependencies that live outside of CRAN, so we have no choice but to access it in a less elegant way.

PabRod commented 2 months ago

Context

Dear @mdingemanse, after wrestling with the ifadv problem for a while, I found the following solution:

  1. Loading data from a .csv is, oddly enough, significantly easier than from R's native format .rda. This is particularly true when your data is located on a URL (if you feel curiosity, compare the (buggy) rda version with the csv one). Long story short, loading a .rda was forcing me to pre-download it, and that shouldn't be necessary.
  2. So I pushed a .csv copy of the dataset to elpaco-escience/ifadv. It is redundant, I know, so I kind of hid it in its own branch.

Request

The tests and vignettes keep running. The snapshots didn't change. But I consider this a potentially critical change in the dataset at its very origin, so it will be great if you can check the results still make sense.

PabRod commented 2 months ago

Last and perhaps least, a reflection:

The original IFADV dataset was published under a permissive license. Without getting into the details, this indicates a spirit of open science and data sharing. It is ironic that this triggered an incompatibility problem with our also permissive license and ultimately forced us to apply a suboptimal solution.

And a coda: Probably all our code, plus all our public conversations, are already used to train IA models anyway.

mdingemanse commented 2 months ago

Yes I quite agree that it would be lovely to redistribute; I remember us spending too much time early 2023 discussing the licensing constraints, and I think the IFADV makers would be scratching their heads about it. I think the current solution works around it in the most CRAN-ready way, which is our proximate goal here so approving

mdingemanse commented 2 months ago

I went ahead and merged, let me know if you prefer me to leave this to you as PR author next time