e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Heroku deployment procedure #714

Closed montaguegabe closed 2 years ago

montaguegabe commented 2 years ago

Hello,

I will be documenting my process of attempting to deploy e-mission on Heroku here.

What I have learned so far:

Heroku recommends Docker over their buildpack system for deployment of any Anaconda projects

I'll be using the Heroku container registry

First clone the docker repo: git clone https://github.com/e-mission/e-mission-docker.git

Then cd to the place with the web-app Dockerfile: examples/em-server-multi-tier-cronjob/webapp/.

Then follow instructions at https://devcenter.heroku.com/articles/container-registry-and-runtime#getting-started to deploy the Dockerfile. The first time you run the commands they will fail because by default, the dynos do not have enough memory. The process of installing all the conda packages requires a little over 1 GB of memory, so you will need to run a dedicated Performance M dyno with 2.5 GB of memory (at least for this step).

Another thing preventing the instructions from succeeding will be configuration of the MongoDB database. I will use a MongoDB database from mongodb.com for now, which makes it easy to provision a DB and get connection parameters to it. These connection parameters should be joined into a connection URL and set in the Heroku dashboard (App -> Settings -> Reveal Config Vars)

Still to figure out

shankari commented 2 years ago

@montaguegabe wrt logging, You should be able to change the config log files (in conf/log/webserver.conf, overrides https://github.com/e-mission/e-mission-server/blob/master/conf/log/webserver.conf.sample)

Hope that helps.

shankari commented 2 years ago

wrt resource usage, I run multiple containers on one AWS instance. The instances have different user counts. A distribution of the resource utilization (using docker stats) is below.

CONTAINER ID        NAME                                             CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
8b2eab5b23b3        vail-stack_vail-web-server_1                     0.74%               221.9MiB / 31.01GiB   0.70%               2.52GB / 2.3GB      10.6GB / 138MB      2.3
acaf01b32416        pc-stack_pc-web-server_1                         0.84%               394.1MiB / 31.01GiB   1.24%               20.8GB / 18.5GB     44.4GB / 205MB      23
8914e02da2f4        cc-stack_cc-web-server_1                         0.90%               696.2MiB / 31.01GiB   2.19%               18.4GB / 20.2GB     39.1GB / 205MB      23
c606785f6eb1        4c-stack_4c-web-server_1                         0.97%               262.7MiB / 31.01GiB   0.83%               5.8GB / 3.65GB      17.1GB / 205MB      23
4d8020c60ce4        sc-stack_sc-web-server_1                         0.79%               179.3MiB / 31.01GiB   0.56%               3.24GB / 3.26GB     16.1GB / 205MB      25
3661242acc75        fc-stack_fc-web-server_1                         0.80%               193.4MiB / 31.01GiB   0.61%               7.52GB / 8.77GB     27.9GB / 205MB      23
674c919c4e5c        stage-stack_stage-web-server_1                   0.96%               400.4MiB / 31.01GiB   1.26%               9.16GB / 10.3GB     33GB / 173MB        23
d1a7d67c2822        prepilot-stack_prepilot-web-server_1             0.84%               532.2MiB / 31.01GiB   1.68%               7.94GB / 2.7GB      20.8GB / 94MB       23
montaguegabe commented 2 years ago

Thanks Shankari! I also have to have the web app use not port 80 or 443, but whatever the runtime environment variable $PORT is set to. Is that possible from within a config file right now?

montaguegabe commented 2 years ago

If sed is being used to set some of the environment variables it may not be working for some reason: I get this:

sed: -e expression #1, char 46: unknown option to `s'

{
"paths" : {
"static_path" : "webapp/www/",
"python_path" : "main",
"log_base_dir" : ".",
"log_file" : "debug.log"
},
"__comment" : "Fill this in for the production server. port will almost certainly be 80 or 443. For iOS, using 172.17.116.118 allows you to test without an internet connection. For AWS and android, make sure that the host 0.0.0.0, localhost does not seem to work",
"server" : {
"host" : "0.0.0.0",
"port" : "8080",
"__comment": "1 hour = 60 min = 60 * 60 sec",
"timeout" : "3600",
"auth": "skip",
"__comment": "Options are no_auth, user_only, never",
"aggregate_call_auth": "no_auth"
}
}
Live reload disabled,
Traceback (most recent call last):
File "emission/net/api/cfc_webapp.py", line 35, in <module>
import emission.net.api.visualize as visualize
File "/usr/src/app/e-mission-server/emission/net/api/visualize.py", line 13, in <module>
import emission.analysis.plotting.geojson.geojson_feature_converter as gfc
File "/usr/src/app/e-mission-server/emission/analysis/plotting/geojson/geojson_feature_converter.py", line 20, in <module>
import emission.storage.decorations.trip_queries as esdt
File "/usr/src/app/e-mission-server/emission/storage/decorations/trip_queries.py", line 15, in <module>
import emission.core.get_database as edb
File "/usr/src/app/e-mission-server/emission/core/get_database.py", line 19, in <module>
config_data = json.load(config_file)
File "/usr/src/app/miniconda-4.8.3/envs/emission/lib/python3.7/json/__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/src/app/miniconda-4.8.3/envs/emission/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/src/app/miniconda-4.8.3/envs/emission/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/src/app/miniconda-4.8.3/envs/emission/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Process exited with status 1
State changed from starting to crashed

I updated my webserver.conf with different values than those displayed above, but they do not appear to be in there

montaguegabe commented 2 years ago

It may have something to do with mongodb connection string, which is printed on the line directly before the line about sed:

mongodb+srv://usern4me_of_lettersnumbers_underscore:passw0rdoflettersnumbers@cluster0.z56eq.mongodb.net/myFirstDatabase?retryWrites=true&w=majority
sed: -e expression #1, char 46: unknown option to `s'
montaguegabe commented 2 years ago

If I do not provide DB_HOST, and instead push to Docker with a db.conf file, then the error disappears, but it says the db.conf file is "overwritten" and instead looks for a MongoDB instance at some hard-coded IP

shankari commented 2 years ago

@montaguegabe we do in fact use sed to overwrite the DB host. I remember something about quoting the string properly for the sed to work...

montaguegabe commented 2 years ago

Maybe it is #595?

montaguegabe commented 2 years ago

^ it was that issue. Also mongodb+srv protocol is not supported so I downgraded the connection string format. And am now onto an error that says pymongo.errors.InvalidURI: MongoDB URI options are key=value pairs.

shankari commented 2 years ago

@montaguegabe aha! That's why issues are a good idea. #595 was reported by our internal cloud services team while setting up with documentDB, and it looks like we fixed it by switching to jq. Is that what you did as well?

#set database URL using environment variable
echo ${DB_HOST}
if [ -z ${DB_HOST} ] ; then
    local_host=`hostname -i`
    jq --arg db_host "$local_host" '.timeseries.url = $db_host' conf/storage/db.conf.sample > conf/storage/db.conf
else
    jq --arg db_host "$DB_HOST" '.timeseries.url = $db_host' conf/storage/db.conf.sample > conf/storage/db.conf
fi
cat conf/storage/db.conf
shankari commented 2 years ago

wrt:

And am now onto an error that says pymongo.errors.InvalidURI: MongoDB URI options are key=value pairs.

That error is from: https://github.com/mongodb/mongo-python-driver/blob/a0fe7c03af08adde0c893071e1664b43570b9841/pymongo/uri_parser.py#L337

which is the part that parses the options part (e.g. retryWrites=true&w=majority). Can you share what the options part looks like after the rewrite?

montaguegabe commented 2 years ago

@montaguegabe aha! That's why issues are a good idea. #595 was reported by our internal cloud services team while setting up with documentDB, and it looks like we fixed it by switching to jq. Is that what you did as well?

#set database URL using environment variable
echo ${DB_HOST}
if [ -z ${DB_HOST} ] ; then
    local_host=`hostname -i`
    jq --arg db_host "$local_host" '.timeseries.url = $db_host' conf/storage/db.conf.sample > conf/storage/db.conf
else
    jq --arg db_host "$DB_HOST" '.timeseries.url = $db_host' conf/storage/db.conf.sample > conf/storage/db.conf
fi
cat conf/storage/db.conf

I just changed the DB username to not have an underscore rather than switching to JQ

montaguegabe commented 2 years ago

wrt:

And am now onto an error that says pymongo.errors.InvalidURI: MongoDB URI options are key=value pairs.

That error is from: https://github.com/mongodb/mongo-python-driver/blob/a0fe7c03af08adde0c893071e1664b43570b9841/pymongo/uri_parser.py#L337

which is the part that parses the options part (e.g. retryWrites=true&w=majority). Can you share what the options part looks like after the rewrite?

So I saw that too but I'm still not sure why the URL fails to parse. MongoDB.com gives me two different formats. There is one for Python with MongoDB driver 3.4 and later: mongodb://<username>:<password>@cluster0-shard-00-00.z56eq.mongodb.net:27017,cluster0-shard-00-01.z56eq.mongodb.net:27017,cluster0-shard-00-02.z56eq.mongodb.net:27017/myFirstDatabase?ssl=true&replicaSet=atlas-a7u6ma-shard-0&authSource=admin&retryWrites=true&w=majority

And there is the invalid one that uses +srv for driver 3.6 or later: mongodb+srv://<username>:<password>@cluster0.z56eq.mongodb.net/myFirstDatabase?retryWrites=true&w=majority

The latter one tells me that +srv is not supported without an additional dns related library installed. The first one is what gives me the parsing error described

shankari commented 2 years ago

Do you print out the modified URL after the SED has finished running?

Because the URL that you listed above (before being put into the DB conf using sed) parses correctly.

$ python3
Python 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:59:12)
[Clang 11.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymongo
>>> pymongo.uri_parser.parse_uri("mongodb://<username>:<password>@cluster0-shard-00-00.z56eq.mongodb.net:27017,cluster0-shard-00-01.z56eq.mongodb.net:27017,cluster0-shard-00-02.z56eq.mongodb.net:27017/myFirstDatabase?ssl=true&replicaSet=atlas-a7u6ma-shard-0&authSource=admin&retryWrites=true&w=majority")
{'nodelist': [('cluster0-shard-00-00.z56eq.mongodb.net', 27017), ('cluster0-shard-00-01.z56eq.mongodb.net', 27017), ('cluster0-shard-00-02.z56eq.mongodb.net', 27017)], 'username': '<username>', 'password': '<password>', 'database': 'myFirstDatabase', 'collection': None, 'options': {'ssl': True, 'replicaSet': 'atlas-a7u6ma-shard-0', 'authSource': 'admin', 'retryWrites': True, 'w': 'majority'}, 'fqdn': None}
>>>

Something in the sed command is still breaking with your URL, I think.

montaguegabe commented 2 years ago

I think I was using Docker CLI ENV=VALUE format which splits on equals - if this is the issue then that's a silly mistake on my end. But waiting to see..

montaguegabe commented 2 years ago

@shankari I think the above wasn't the issue (wishful thinking) – it is still broken. Any tips on how I can most easily change and debug the sed command? It is baked into FROM emission/e-mission-server.dev.server-only:2.9.1 is it not? How do I rebuild that container so I can change that line?

shankari commented 2 years ago

you can just create a new version of start_script.sh https://github.com/e-mission/e-mission-docker/blob/master/start_script.sh and then change the Dockerfile for the webapp directory to copy the modified version.

e.g. something like

ADD start_script.sh /start_script.sh
RUN chmod u+x /start_script.sh

COPY start_script.sh /usr/src/app/start_script.sh
CMD ["/bin/bash", "/usr/src/app/start_script.sh"]
montaguegabe commented 2 years ago

Thank you! I ended up rebuilding the Dockerfile that it draws from (fewer steps to go wrong) and removing the sed commands entirely for now. That causes it to work. However it still crashes due to No such file or directory: 'conf/net/ext_service/habitica.json'

shankari commented 2 years ago

That error is a warning, not a crash. https://github.com/e-mission/e-mission-docs/search?q=habitica.json&type=issues

montaguegabe commented 2 years ago

Will run again and see if that was really the error..

montaguegabe commented 2 years ago

Heroku I think may treat errors as warnings, but I just put a habitca.json in there to get rid of it. Now the error is that Heroku only waits so long for a port to be bound to, and all the conda installing makes it time out.

Heroku allows longer build times with this link: https://tools.heroku.support/limits/boot_timeout

I have yet to see if this works..

montaguegabe commented 2 years ago

To anyone else looking at this issue thread, Heroku can actually do all the conda installs and set the server up in time if you set the slider to the maximum value of 180 seconds.

However I must have disabled sed for the port and host because it is not binding correctly (says localhost in the logs).

montaguegabe commented 2 years ago

I haven't set the environment variable WEB_SERVER_HOST - maybe that is the problem..

montaguegabe commented 2 years ago

I deleted the sed commands surrounding WEB_SERVER_HOST as well. And am on to

File "emission/net/api/cfc_webapp.py", line 70, in <module>
aggregate_call_auth = config_data["server"]["aggregate_call_auth"]
KeyError: 'aggregate_call_auth'
montaguegabe commented 2 years ago

I added that key from the conf file. Now I am onto the main problem I knew I would face: Heroku apps must bind to a port based on the $PORT environment variable as per https://devcenter.heroku.com/articles/runtime-principles#web-servers

montaguegabe commented 2 years ago

I have created https://github.com/montaguegabe/e-mission-server that overrides the config value and just uses the environment variable. Will update the docker repo to use this repository instead for cloning..

montaguegabe commented 2 years ago

I am now getting a message that says "Listening on http://mm-e-mission.herokuapp.com/:6691/" <- the colon looks like it ought not to be there. Subsequently it fails to create the socket with

2022-03-29T18:21:47.361284+00:00 app[web.1]: File "/usr/src/app/miniconda-4.8.3/envs/emission/lib/python3.7/site-packages/cheroot/server.py", line 1772, in prepare
2022-03-29T18:21:47.361284+00:00 app[web.1]: raise socket.error(msg)
2022-03-29T18:21:47.361284+00:00 app[web.1]: OSError: No socket could be created -- (('mm-e-mission.herokuapp.com/', 6691): [Errno -2] Name or service not known)
montaguegabe commented 2 years ago

Deleting the trailing slash from the hostname worked, but it appears that it still can't bind. I found this example https://github.com/mispy-archive/cherrypy-heroku-example/blob/master/app.py which appears to be related to CheRoot where they use 0.0.0.0 as the host to bind to. Will try that out

montaguegabe commented 2 years ago

It works! Thanks for the help Shankari!

To summarize my findings:

montaguegabe commented 2 years ago

Before going on to the analysis server I am going to try to relax the memory requirement

shankari commented 2 years ago

Thanks @montaguegabe looking at this, some changes to the startup script that would make this be easier would be:

Is there anything else that the docker container could do better?

montaguegabe commented 2 years ago

Both of those first two sound good! The third one is actually currently not an issue as they are separate parameters currently - the problem was me ending the hostname in a slash there. The config files provide more structure so some people may like them better, but certainly environment variables from the Heroku perspective simplify things.

The thing that would be super helpful is to not have to wait 3-5 minutes for the Docker container to build while debugging this. This would also help with uptime in production settings (if a node fails you have to wait 3-5 minutes until a new node is ready to replace it). I'm not an expert in Docker, so I don't know exactly how to implement this, but right now the conda installation happens when the containers are run, but it probably should happen when the containers are built.

shankari commented 2 years ago

@montaguegabe wrt "not have to wait 3-5 minutes for the Docker container to build while debugging this.", there is such an image (https://hub.docker.com/repository/docker/emission/e-mission-server). Note that the image we are now using is "dev" (FROM emission/e-mission-server.dev.server-only).

It's just that the e-mission-server image is not kept up-to-date since I don't have a CI pipeline set up for it. Yet. and it is not high enough priority to focus on it compared to the other fit and finish changes.

montaguegabe commented 2 years ago

So if I understand you correctly, “dev” is the one that installs conda dependencies upon running, and the plain/vanilla Dockerfile installs them upon building?

montaguegabe commented 2 years ago

I'm basically looking for something where I can run the docker image and the only code that is triggered is code that runs the server (not installation code)

shankari commented 2 years ago

@montaguegabe yes, as you can see from the primary dockerfile https://github.com/e-mission/e-mission-docker/blob/master/Dockerfile, the e-mission-server image installs the dependencies at build instead of at runtime.

montaguegabe commented 2 years ago

Hi Shankari, unfortunately when I go to that directory with the Dockerfile and build the image, it not only tries to install the conda dependencies but also tries to run the server in start_script.sh. I comment out the line that runs the server and move it to the start_script.sh of webapp in multi-tier-cronjob, but then I get stuff about conda not being found on the PATH. Any recommended solution for this situation? Thanks very much!

shankari commented 2 years ago

@montaguegabe your original requirement was:

I'm basically looking for something where I can run the docker image and the only code that is triggered is code that runs the server (not installation code)

That is exactly what the image does - it bundles all the dependencies into the image and when you run it, it only runs the server. Not sure what the problem is with

it not only tries to install the conda dependencies but also tries to run the server in start_script.sh

montaguegabe commented 2 years ago

@shankari I just tried on a fresh git clone in case I had messed something up. For me at least, I go to the root of the e-mission-docker repo and run docker build -t gabeistesting -f Dockerfile . (-f Dockerfile is just to make sure that the vanilla Dockerfile is used). The command then fails with the following:

 > [15/15] RUN ["/bin/bash", "/start_script.sh"]:
#20 0.638 cat: conf/net/api/webserver.conf: No such file or directory
#20 0.640 Live reload disabled, 
#20 33.21 storage not configured, falling back to sample, default configuration
#20 33.21 Connecting to database URL localhost
#20 33.21 analysis.debug.conf.json not configured, falling back to sample, default configuration
#20 33.71 Traceback (most recent call last):
#20 33.71   File "emission/net/api/cfc_webapp.py", line 35, in <module>
#20 33.71     import emission.net.api.visualize as visualize
#20 33.71   File "/usr/src/app/e-mission-server/emission/net/api/visualize.py", line 28, in <module>
#20 33.71     import emission.storage.timeseries.aggregate_timeseries as estag
#20 33.71   File "/usr/src/app/e-mission-server/emission/storage/timeseries/aggregate_timeseries.py", line 15, in <module>
#20 33.71     import emission.storage.timeseries.builtin_timeseries as bits
#20 33.71   File "/usr/src/app/e-mission-server/emission/storage/timeseries/builtin_timeseries.py", line 19, in <module>
#20 33.71     esta.EntryType.DATA_TYPE: edb.get_timeseries_db(),
#20 33.71   File "/usr/src/app/e-mission-server/emission/core/get_database.py", line 161, in get_timeseries_db
#20 33.71     TimeSeries.create_index([("user_id", pymongo.HASHED)])
#20 33.71   File "/root/miniconda-4.8.3/envs/emission/lib/python3.7/site-packages/pymongo/collection.py", line 2059, in create_index
#20 33.71     return self.__create_indexes([index], session, **cmd_options)[0]
#20 33.71   File "/root/miniconda-4.8.3/envs/emission/lib/python3.7/site-packages/pymongo/collection.py", line 1919, in __create_indexes
#20 33.71     with self._socket_for_writes(session) as sock_info:
#20 33.71   File "/root/miniconda-4.8.3/envs/emission/lib/python3.7/site-packages/pymongo/collection.py", line 198, in _socket_for_writes
#20 33.71     return self.__database.client._socket_for_writes(session)
#20 33.71   File "/root/miniconda-4.8.3/envs/emission/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1293, in _socket_for_writes
#20 33.71     server = self._select_server(writable_server_selector, session)
#20 33.71   File "/root/miniconda-4.8.3/envs/emission/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1278, in _select_server
#20 33.71     server = topology.select_server(server_selector)
#20 33.71   File "/root/miniconda-4.8.3/envs/emission/lib/python3.7/site-packages/pymongo/topology.py", line 243, in select_server
#20 33.71     address))
#20 33.71   File "/root/miniconda-4.8.3/envs/emission/lib/python3.7/site-packages/pymongo/topology.py", line 200, in select_servers
#20 33.71     selector, server_timeout, address)
#20 33.71   File "/root/miniconda-4.8.3/envs/emission/lib/python3.7/site-packages/pymongo/topology.py", line 217, in _select_servers_loop
#20 33.71     (self._error_message(selector), timeout, self.description))
#20 33.71 pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 6244a5016c2a6bdb415bb86f, topology_type: Single, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused')>]>
------
executor failed running [/bin/bash /start_script.sh]: exit code: 1

To me, this looks like the build step is trying to run the server, not just set up the image. This is important for us because on Heroku it is the difference between $500/month to pay for the expensive remote instances that can handle the installation steps and $0/month free trial (although of course we will scale up from this)

montaguegabe commented 2 years ago

Actually after updating my Docker it works fine.. sorry about that - am investigating. Thanks again for your help!

montaguegabe commented 2 years ago

Just an update:

I think I didn't really do a "fresh clone" like I said before. With that solved, I can now build the vanilla Dockerfile, and then reference it from the Dockerfile in the multitier-cronjob example, and build that one too. When I go to actually run the final result, it is running the command conda env update --name emission --file setup/environment36.yml, and I see "Requirement already up-to-date", meaning that the image already has the dependencies installed (like you said). However it seems that just calling conda env update --name emission --file setup/environment36.yml is enough to consume over 1 GB of memory. For our case, I will have to investigate a way to not have that command called when the image is run.

shankari commented 2 years ago

@montaguegabe Phew! I thought I was in for an intense bout of troubleshooting.

The dockerfiles in the multitier-cronjob use a custom start_script.sh which first clones the server. If you remove that (since the server is already installed), you will not redo the setup steps https://github.com/e-mission/e-mission-docker/blob/master/examples/em-server-multi-tier-cronjob/webapp/start_script.sh#L2

montaguegabe commented 2 years ago

Hi Shankari, as far as I can tell, the conda issue is still there. I get an error when running the image:

setup/activate_conda.sh: line 6: /usr/src/app/miniconda-4.8.3/etc/profile.d/conda.sh: No such file or directory

This is because the Dockerfile is running build CMDs as root by default, and so I can see it installs miniconda to /root/miniconda-4.8.3/. However, when activate_conda.sh is called as part of running the image, for some reason I gather that $HOME is set to /usr/src/app, so it looks for miniconda there and doesn't find it

montaguegabe commented 2 years ago

May be related to this from Heroku's documentation: "We strongly recommend testing images locally as a non-root user, as containers are not run with root privileges on Heroku."

shankari commented 2 years ago

Yes that is probably it. There are some examples of building the docker container as a non-root user (https://stackoverflow.com/questions/67261873/building-docker-image-as-non-root-user).

Basically, you either need to:

I don't think you need to change users otherwise since we only use $HOME to configure the miniconda location.

shankari commented 2 years ago

@montaguegabe did you get this to work?

montaguegabe commented 2 years ago

Hi Shankari! Yes it worked for the webapp! I just had to make sure the conf was being copied to where python was expecting it to be. I just have to set up analysis now..

montaguegabe commented 2 years ago

I'm assuming to be able to fill out push.json I will need to deploy the frontend first

shankari commented 2 years ago

well, you don't technically have to deploy the frontend, but you do need to create a firebase account And you can't test it until you build the frontend, so you might as well get that to work first :)

shankari commented 2 years ago

@montaguegabe any updates on this? want to make sure that there isn't anything pending that I don't know about.