Open shukryzablah opened 4 years ago
I've asked IT about how best to make the data available (as I agree that the current setup is insufficient).
The full data will be accessible via nhorton.people.amherst.edu/ValleyBikes
Should I just add in the files from Box?
Yes, but it would be better to host compressed versions of all the files.
In my computer I downloaded the ~300MB zip file from Box and did:
unzip ValleyBike.zip
cd ValleyBike
gzip *
Can you then move the whole folder to your server?
I will make the available as https://nhorton.people.amherst.edu/valleybikes
Done.
But I now wonder if we shouldn't just include all of the compressed daily files in extdata? Is that what you are thinking?
I was thinking that could bloat the package. All the compressed files amount to 333MB, with the largest being 4MB but and the median being <1MB. I think it would be unnecessary coupling of data and package, and by providing files online we can update with more data without uploading package, as well as share the link to people that want to bypass R.
It would be slightly simpler to implement, but the interface of the package would still be similar.
On Tue, Nov 12, 2019 at 8:04 AM Nicholas Horton notifications@github.com wrote:
Done.
But I now wonder if we shouldn't just include all of the compressed daily files in extdata? Is that what you are thinking?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Amherst-Statistics/ValleyBikes/issues/11?email_source=notifications&email_token=AG3OAITTTNU3VEJVMC37KN3QTKSVHA5CNFSM4JFNS2O2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED2FRRI#issuecomment-552884421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3OAIXMT47KYEMTW6SXO73QTKSVHANCNFSM4JFNS2OQ .
Great. Let's stick with the plan to have the files online and not bloat the package.
Trying to download a file from https://nhorton.people.amherst.edu/valleybikes/ gives a 403 error (forbidden). In both the browser and through R.
Apologies. I had set the wrong umask. I've changed the protections and this should now be working.
The etl framework does seem like a good idea.
However I think the first step is to have the files online and a function in the package that downloads them to a directory. This could end up being our
etl_extract
in the long run and is helpful to have this by itself too.The data should be public and the Box folder will not be the most friendly to pull data from. Should we host it ourselves?