dataobservatory-eu / observatory_workflow

Testing our Datathlon workflow
0 stars 4 forks source link

Describe how to deploy R, API, bookdown and hugo (one server per observatory? netflify? subdomains?) #6

Open antaldaniel opened 3 years ago

antaldaniel commented 3 years ago

Currently we can only handle top-level domains from Netlify.

Our observatories on subdomains, and self-hosted.

Each observatory has five elements (4 will be operational in the first weeks): 1 - hugo website 2 - bookdown documentation 3 - Dataset API [these have an example instance] 4 - R Collector code [ only from Eurostat warehouse and Zenodo harvesting Zenodo manual depositions] +1 Python collector code.

Where should these be a) deployed b) hosted in a way that they can be arranged to either libraries greendeal.dataobservatory.eu greendeal.dataobservatory.eu/docuementation greendeal.dataobservatory.eu/data

Or greendeal.dataobservatory.eu data.greendeal.dataobservatory.eu documentation.greendeal.dataobservatory.eu

Daniel can for a few days manually do this, but a stable solution is needed.

The best would be to describe here in md file, and then configuring the necessary instances.

Self hosting has the advantage that I can work without restriction on domains, subdomains, zones, but in that case the cloud server must upload the final html files via ftp, and we still need a place for the API (that can, of course, also be self-hosted.)

antaldaniel commented 3 years ago

@KKulma Assigned to Boti, because he started to do it, but he should describe, you should review

bvitos commented 3 years ago

In terms of self-hosting, I am not familiar with Hugo, but the Datasette API is already running as a service on an Ubuntu AWS server. We could install Hugo there as well, and in my understanding we could use git-hooks for deploying the new Hugo content through git pushes.

bvitos commented 3 years ago

I like the idea of subdomains too, assigned to different servers

bvitos commented 3 years ago

@antaldaniel Just checked your afternoon mail. Good to hear that Pyry can provide valuable advice on Hugo. I can certainly set up the AWS servers with the Datasette API running.

KKulma commented 3 years ago

Good idea, @bvitos ! Don't want to be too biased towards GitHub Actions... ;) But I think they may take a lot of pain away in this case, have a look at one example here

In terms of self-hosting, I am not familiar with Hugo, but the Datasette API is already running as a service on an Ubuntu AWS server. We could install Hugo there as well, and in my understanding we could use git-hooks for deploying the new Hugo content through git pushes.

bvitos commented 3 years ago

Sounds good. :) Could be used for updating the dataset too. Our current test server is a t3 AWS instance with 8GB RAM. A python script is running as system service, and checks for new .db files in the /home/ubuntu folder every 10 seconds. In case it finds one, it appends the records to the dataset, and moves the file with a timestamp into the /home/ubuntu/archives folder of the instance.

bvitos commented 3 years ago

I added another server with the same dataset for now: 52.4.54.69 . @antaldaniel do we have a test version for the green dataset? The instance is more performant, r5.large with 16GB RAM, should be better suited for database operations (price difference $0.126 vs $0.083/hr). I think I will just dockerise the .db collector because as a system service it seems a bit unreliable. What is our strategy with Hugo? @pitkant should we run it from the same server or separately?