Open poikilotherm opened 4 years ago
We've integrated Rserve in Dataverse Docker module, I don't know if you want to host a separated Docker images for that: https://github.com/IQSS/dataverse-docker/commit/973cc9c633952e7600715c56f550985814dcf69e
IMHO this should be kept apart. I do believe in the UNIX philosophy "do one thing, do it well". This gives more flexibility for people that might want to run their own services, use special flavors, install certain amount of packages, ...
Ok, you should contact people from Rserve then.
If it helps, I've been happily using Rserve on Dataverse spun up by dataverse-ansible since @donsizemore implemented it over the summer: https://github.com/IQSS/dataverse-ansible/pull/87
Data Explorer didn't work properly without it. It takes time to compile all the R modules so I sometimes comment it out if I don't need the functionality.
It takes time to compile all the R modules so I sometimes comment it out if I don't need the functionality.
@pdurbin you may also set rserve.install
to false =) the role will still place rserve.host
et al. in domain.xml to talk to an external R service.
@donsizemore, in the same time it's not really sustainable if Dataverse is relying on an external R service that should do data processing.
On a related note, we've considered splitting the "ingest" service out of the Dataverse monolith and into its own microservice: https://github.com/IQSS/dataverse/issues/2331
Not all installations of Dataverse want ingest (I'm thinking of Pete's structural biology datasets) but I suspect most do. 😄
@4tikhonov note that Akio's TRSA branch https://github.com/OdumInstitute/trsa-web/tree/jee8line carves ingest out of Dataverse proper and at present makes it optional to the end user. what would you prefer Dataverse use in addition to or instead of R?
I'd really love to discuss this matter in more depth, but I'm pretty sure this is beyond the scope of this issue.
Maybe some of you guys can open an issue at IQSS/dataverse, so it reaches even more people interested in ingest?
@pdurbin : Regarding the R script that runs on Rserve and produces metadata summaries:
cc/ @tercer
@raprasad, I really like this solution as python microservice. Not because we're "at home" with python but because it can be more sustainable in the long term perspective.
@raprasad wonderful news! Go @aaron-lebo go!
a slightly different structure
@raprasad is the JSON emitted from your new Python code backward compatible with the JSON emitted from the old/current R code? If not, would it be possible to make it backward compatible? I don't want Data Explorer (my main reason for wanting this JSON) to break if we switch to backward-incompatible JSON produced by new code.
Now that we (finally) have API tests running automatically on "develop" and pull requests (on https://jenkins.dataverse.org thanks to the absolutely heroic efforts of @donsizemore !!! 🎉 🎉 🎉 ), we could start to make assertions on the old/current JSON format so that any backward incompatibilities would be detected. Writing those assertions might be a good first small chunk. If someone wants to create an issue about this at https://github.com/IQSS/dataverse/issues please go ahead! 😄
@pdurbin We will add the backward compatibility to the library. Pleae add comments that may be relevant: https://github.com/TwoRavens/raven-metadata-service/issues/205
Some ingest functionality does not work without an Rserve server.
Looks like https://github.com/ubc/r-docker is a trustworthy image, coming from University of British Columbia.
Maybe open an issue over there asking what their plans are on supporting and pushing updated images to Docker Hub: https://hub.docker.com/r/ubcctlt/rserve