PerfectFit-project / virtual-coach-issues

Central issues-only repository for tracking all issues for the PerfectFit Virtual Coach System.
2 stars 1 forks source link

Setup first deployment to Heroku #169

Closed svenvanderburg closed 2 years ago

svenvanderburg commented 2 years ago

Setup deployment to Heroku for the first time. Aks @svenvanderburg to add you to the Heroku team.

~Might be blocked by #159~

dsmits commented 2 years ago

I started doing this but it's not finished. I opened up a PR for it: https://github.com/PerfectFit-project/virtual-coach-rasa/pull/7

dsmits commented 2 years ago

The rasa server is deployed to https://perfectfit-rasa-server.herokuapp.com/ Other components haven't deployed yet.

dsmits commented 2 years ago

Another important thing to mention, I think that this tutorial works best for our usecase: https://devcenter.heroku.com/articles/container-registry-and-runtime

It is also possible to describe the different components of an app in a heroku.yml file but I think you can only have one "web" and one "worker" component. My guess is that this doesn't fit the current repo structure since it looks like virtual-coach-rasa repo contains more than 2 of these components.

raar1 commented 2 years ago

One of the problems with using the heroku.yml is that it pulls the code from our git repo and builds it on the heroku side. In principle this is a good thing, but given we have the issue with use of the private goalie-js lib etc then it complicates things. I have instead set up a github action that builds containers and pushes them to the heroku container registry. These can then be 'released' when we want to deploy.

raar1 commented 2 years ago

One strange behaviour though, is that the rasa_server which runs fine locally, gives the following crash when deployed on heroku:

2022-04-02T09:48:16.829334+00:00 heroku[rasa_server.1]: Starting process with command `rasa run --enable-api`
2022-04-02T09:48:17.457016+00:00 heroku[rasa_server.1]: State changed from starting to up
2022-04-02T09:48:18.557269+00:00 heroku[rasa_server.1]: Process exited with status 2
2022-04-02T09:48:18.687683+00:00 heroku[rasa_server.1]: State changed from up to crashed
2022-04-02T09:48:18.342522+00:00 app[rasa_server.1]: usage: rasa run [-h] [-v] [-vv] [--quiet] [-m MODEL] [--log-file LOG_FILE]
2022-04-02T09:48:18.342534+00:00 app[rasa_server.1]:                 [--use-syslog] [--syslog-address SYSLOG_ADDRESS]
2022-04-02T09:48:18.342535+00:00 app[rasa_server.1]:                 [--syslog-port SYSLOG_PORT]
2022-04-02T09:48:18.342535+00:00 app[rasa_server.1]:                 [--syslog-protocol SYSLOG_PROTOCOL] [--endpoints ENDPOINTS]
2022-04-02T09:48:18.342535+00:00 app[rasa_server.1]:                 [-i INTERFACE] [-p PORT] [-t AUTH_TOKEN]
2022-04-02T09:48:18.342536+00:00 app[rasa_server.1]:                 [--cors [CORS [CORS ...]]] [--enable-api]
2022-04-02T09:48:18.342536+00:00 app[rasa_server.1]:                 [--response-timeout RESPONSE_TIMEOUT]
2022-04-02T09:48:18.342536+00:00 app[rasa_server.1]:                 [--remote-storage REMOTE_STORAGE]
2022-04-02T09:48:18.342536+00:00 app[rasa_server.1]:                 [--ssl-certificate SSL_CERTIFICATE]
2022-04-02T09:48:18.342536+00:00 app[rasa_server.1]:                 [--ssl-keyfile SSL_KEYFILE] [--ssl-ca-file SSL_CA_FILE]
2022-04-02T09:48:18.342537+00:00 app[rasa_server.1]:                 [--ssl-password SSL_PASSWORD] [--credentials CREDENTIALS]
2022-04-02T09:48:18.342537+00:00 app[rasa_server.1]:                 [--connector CONNECTOR] [--jwt-secret JWT_SECRET]
2022-04-02T09:48:18.342537+00:00 app[rasa_server.1]:                 [--jwt-method JWT_METHOD]
2022-04-02T09:48:18.342537+00:00 app[rasa_server.1]:                 {actions} ... [model-as-positional-argument]
2022-04-02T09:48:18.342537+00:00 app[rasa_server.1]: rasa run: error: invalid choice: 'rasa' (choose from 'actions')

Presumably the heroku tool builds the container differently?

raar1 commented 2 years ago

Ah, sorry, it's because heroku doesn't like ENTRYPOINT. Changing it to CMD as Djura did fixes that. Now the rasa_actions and scheduler containers are up and running correctly. However, rasa_server goes out of memory:

2022-04-02T10:24:00.413811+00:00 app[rasa_server.1]: 2022-04-02 10:24:00 INFO     root  - Starting Rasa server on http://0.0.0.0:5005

2022-04-02T10:24:01.521456+00:00 app[rasa_server.1]: 2022-04-02 10:24:01 INFO     rasa.core.processor  - Loading model models/20220330-111432-glad-skyway.tar.gz...

2022-04-02T10:24:02.015180+00:00 app[rasa_server.1]: 2022-04-02 10:24:02 INFO     rasa.nlu.utils.spacy_utils  - Trying to load SpaCy model with name 'nl_core_news_sm'.

2022-04-02T10:24:05.039816+00:00 app[rasa_server.1]: 2022-04-02 10:24:05 INFO     rasa.nlu.utils.spacy_utils  - Trying to load SpaCy model with name 'nl_core_news_sm'.

2022-04-02T10:24:20.084591+00:00 heroku[rasa_server.1]: Process running mem=670M(127.5%)

2022-04-02T10:24:20.103088+00:00 heroku[rasa_server.1]: Error R14 (Memory quota exceeded)
raar1 commented 2 years ago

Turns out the hobby dyno is not enough (512MB mem), we need at least the "Standard 2x" dyno (50$ a month). It might be a good idea to combine the rasa_actions with the rasa_server so they run on the same dyno. They're very closely connected anyway.

raar1 commented 2 years ago

If we need more than 1 GB of RAM, we're instantly up to minimum 250$ a month...

dsmits commented 2 years ago

Since this is quite a big assignment I've created a separate issue to tackle deploying the database #231

raar1 commented 2 years ago

OK there appears to be a fairly big problem here - the dynos are not able to communicate with each other, even within the same app, unless we switch to private spaces (1000$+ a month). This is what the documentation seems to be saying. Otherwise, dynos are completely isolated from each other. I'm thinking we need to look at a PaaS that isn't so hard to convert docker-compose setups to. I'm hoping I've misunderstood something though, so I'll look more into it.

raar1 commented 2 years ago

After some digging and communicating with SURF etc, there would appear to be a couple of possibilities open to us:

  1. Continue with Heroku, and pay for Private Spaces etc. Very expensive, and might still require more configuration.
  2. Continue with Heroku, and adapt existing services to do all their communication via a message queue add-on. This queue will then be 'external' to the network (in the same way the managed postgres and redis are) so communication in this way between services will work. Not difficult to do, but definitely will take more dev time and it feels overengineered.
  3. Explore other AWS PaaS-like options. ECS Fargate is not the right tool here, even though you can essentially launch docker-compose configured services, because it's on-demand and the pricing reflects that. If you try to spin it up that way and then run your services for a month, you'll get a very large bill (see https://github.com/PerfectFit-project/virtual-coach-issues/issues/241). You could probably do something with lamdas or whatever to only spin up when requests need to be processed, but this will take a fair amount of rejigging in the code base and seems overengineered for the problem.
  4. Drop the PaaS approaches, and go for simplest IaaS possible. Provision a single VM and run the docker-compose there. This could be done on AWS, with an AWS managed db. Or Digital Ocean (which also appears to offer managed postgres). Or fly.io etc.
  5. Run on two VMs on SURF Cloud. One VM has the database, for which we manage snapshotting and patches, and the other VM runs the remaining containers. The database snapshots we take will need to be stored somewhere other than the same data centre, so maybe could consider S3 bucket or whatever. Auto-patching is not likely to be a serious consideration in this setup since the db is purely accessed internally. No ports will be open to the web etc. All connections come to the app via the broker which starts the connection itself. Managing the db ourselves is not an insurmountable problem.
raar1 commented 2 years ago

The testing on Heroku is essentially finished, with outcomes as detailed above. More specific issues have been created to follow up on these points (#244, #245, #246) so I'm closing the current issue.