gdcc / dataverse-kubernetes

Simple to use Dataverse container images and Kubernetes objects
http://k8s-docs.gdcc.io
Apache License 2.0
27 stars 26 forks source link

Rserve integration #120

Open poikilotherm opened 4 years ago

poikilotherm commented 4 years ago

Some ingest functionality does not work without an Rserve server.

Looks like https://github.com/ubc/r-docker is a trustworthy image, coming from University of British Columbia.

Maybe open an issue over there asking what their plans are on supporting and pushing updated images to Docker Hub: https://hub.docker.com/r/ubcctlt/rserve

4tikhonov commented 4 years ago

We've integrated Rserve in Dataverse Docker module, I don't know if you want to host a separated Docker images for that: https://github.com/IQSS/dataverse-docker/commit/973cc9c633952e7600715c56f550985814dcf69e

poikilotherm commented 4 years ago

IMHO this should be kept apart. I do believe in the UNIX philosophy "do one thing, do it well". This gives more flexibility for people that might want to run their own services, use special flavors, install certain amount of packages, ...

4tikhonov commented 4 years ago

Ok, you should contact people from Rserve then.

pdurbin commented 4 years ago

If it helps, I've been happily using Rserve on Dataverse spun up by dataverse-ansible since @donsizemore implemented it over the summer: https://github.com/IQSS/dataverse-ansible/pull/87

Data Explorer didn't work properly without it. It takes time to compile all the R modules so I sometimes comment it out if I don't need the functionality.

poikilotherm commented 4 years ago

https://github.com/IQSS/dataverse-ansible/blob/e09ea347aed27a0e5253d94f3818e3381da8db1d/tasks/rserve.yml#L19-L23 definitely helps :smile:

donsizemore commented 4 years ago

It takes time to compile all the R modules so I sometimes comment it out if I don't need the functionality.

@pdurbin you may also set rserve.install to false =) the role will still place rserve.host et al. in domain.xml to talk to an external R service.

4tikhonov commented 4 years ago

@donsizemore, in the same time it's not really sustainable if Dataverse is relying on an external R service that should do data processing.

pdurbin commented 4 years ago

On a related note, we've considered splitting the "ingest" service out of the Dataverse monolith and into its own microservice: https://github.com/IQSS/dataverse/issues/2331

Not all installations of Dataverse want ingest (I'm thinking of Pete's structural biology datasets) but I suspect most do. 😄

donsizemore commented 4 years ago

@4tikhonov note that Akio's TRSA branch https://github.com/OdumInstitute/trsa-web/tree/jee8line carves ingest out of Dataverse proper and at present makes it optional to the end user. what would you prefer Dataverse use in addition to or instead of R?

poikilotherm commented 4 years ago

I'd really love to discuss this matter in more depth, but I'm pretty sure this is beyond the scope of this issue.

Maybe some of you guys can open an issue at IQSS/dataverse, so it reaches even more people interested in ingest?

raprasad commented 4 years ago

@pdurbin : Regarding the R script that runs on Rserve and produces metadata summaries:


cc/ @tercer

4tikhonov commented 4 years ago

@raprasad, I really like this solution as python microservice. Not because we're "at home" with python but because it can be more sustainable in the long term perspective.

donsizemore commented 4 years ago

@raprasad wonderful news! Go @aaron-lebo go!

pdurbin commented 4 years ago

a slightly different structure

@raprasad is the JSON emitted from your new Python code backward compatible with the JSON emitted from the old/current R code? If not, would it be possible to make it backward compatible? I don't want Data Explorer (my main reason for wanting this JSON) to break if we switch to backward-incompatible JSON produced by new code.

Now that we (finally) have API tests running automatically on "develop" and pull requests (on https://jenkins.dataverse.org thanks to the absolutely heroic efforts of @donsizemore !!! 🎉 🎉 🎉 ), we could start to make assertions on the old/current JSON format so that any backward incompatibilities would be detected. Writing those assertions might be a good first small chunk. If someone wants to create an issue about this at https://github.com/IQSS/dataverse/issues please go ahead! 😄

raprasad commented 4 years ago

@pdurbin We will add the backward compatibility to the library. Pleae add comments that may be relevant: https://github.com/TwoRavens/raven-metadata-service/issues/205