International-Soil-Radiocarbon-Database / ISRaD

Repository for the development and release of ISRaD data and tools
https://international-soil-radiocarbon-database.github.io/ISRaD/
24 stars 15 forks source link

geospatial data folder #115

Closed greymonroe closed 5 years ago

greymonroe commented 5 years ago

We need a central location for the geospatial data. It would be nice if we could host a link on our site

coreylawrence commented 5 years ago

This might be one of those circumstances were we don't want to serve up the actual data. It would be preferred if we could host scripts that would pull data from other managed repositories. That said, maybe we hold geospatial data behind the curtain so that it can be used during the build function to fill in select variables that are then included in israd_extra? I think there may be some permission issues if we download geospatial datasets and then serve them up again from our own links. Or am I miss understanding the issue?

greymonroe commented 5 years ago

That’s fine, we don’t need to share with everyone. We do need some central location for the data that we can use privately.

On Nov 30, 2018, at 4:51 PM, coreylawrence notifications@github.com wrote:

This might be one of those circumstances were we don't want to serve up the actual data. It would be preferred if we could host scripts that would pull data from other managed repositories. That said, maybe we hold geospatial data behind the curtain so that it can be used during the build function to fill in select variables that are then included in israd_extra? I think there may be some permission issues if we download geospatial datasets and then serve them up again from our own links. Or am I miss understanding the issue?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/International-Soil-Radiocarbon-Database/ISRaD/issues/115#issuecomment-443244702, or mute the thread https://github.com/notifications/unsubscribe-auth/AP5w_GcDaT9x0lkgf8eEAX0PyuqnNoumks5u0VPkgaJpZM4Y749F.

ShaneStoner commented 5 years ago

In the spatial soil fill function, the script calls the ISRIC FTP server and downloads files to a local directory. They are global 250m resolution .tif files, so they're all at least 2.5GB each.

We should possibly designate one person/computer to build the israd.extra object that we serve up in the package. They would need to have all the files downloaded locally.

I mentioned it already to Grey and some of the folks at MPI, but there is a service that the Max Planck digital library provides called "Keeper". It's like Dropbox and used for storing and archiving collaborative data for use by Max Planck projects. We have 2 TB of storage, and it can be configured to sync local and cloud versions just like dropbox. I've actually already backed up the geospatial files that I used for the soil fill function on an ISRaD project repository on Keeper. Of course, these files still need to be downloaded/available locally every time we build the object, but if files are collaboratively placed in the folder the "designated builder" can have them download automatically. If we are interested in using Keeper for this purpose, I just need to send a list of emails of non-MPI folks to the Max Planck library and they can grant access.

jb388 commented 5 years ago

I like this plan, Shane. I say go for it.

Honestly I don't love having the geospatial data fill function bundled with ISRaD.extra. I think it's nice as a stand-alone function, but it's a distinct process from simply crunching numbers or filling values from the native ISRaD_data object. The mere fact that the server downloads such huge files makes it impractical for most users---and I think that users may want to run ISRaD.extra themselves, especially if they are working with local data.

Additionally, as was discussed during the last call, we should make sure that it's legally OK for us to serve these data in this way.

ShaneStoner commented 5 years ago

Yeah, that makes sense. Perhaps we could include functions like this as an argument for the ISRaD.extra build function (unless I'm misunderstanding how ISRaD.extra works in R). The code is built in, but it doesn't do these functions by default.

Of course I don't know for sure, but I think as long as we document the source of the data and reference the ISRIC site/papers then it shouldn't be a big problem to serve up data derived from open-access spatial data. I will try to find out more.

greymonroe commented 5 years ago

What is the status of this? We need to either (A) have the data available so that we can run ISRaD.build() or (B) remove the ISRaD.extra() calls from the build function.

greymonroe commented 5 years ago

issue has been fixed by using Keeper