freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
553 stars 151 forks source link

Set up dockerized Nginx + gunicorn for py3 #1429

Closed mlissner closed 4 years ago

mlissner commented 4 years ago

In #1419 I went gray trying to get apache working. This issue is the follow-on, where I restore my sanity and get Nginx set up. Most of the requirements from that ticket still apply, but the way I'm thinking about this that there are several parts to this:

  1. Get nginx working properly
  2. Make sure it performs sufficiently
  3. Figure out ansible and deployments
  4. Figure out monitoring via munin or whatever

I've read a lot of documentation and I'm starting with this guide that looks damned good: https://testdriven.io/blog/dockerizing-django-with-postgres-gunicorn-and-nginx/

mlissner commented 4 years ago

A few other useful resources:

A few tests to run after the fact:

Performance tests:

Optimizations:

  1. SSL Labs
  2. Google PageSpeed
mlissner commented 4 years ago

Just did a couple performance tests on robots.txt since it comes almost entirely from cache:

2000 requests, concurrency of 300:

↪ ab -n 2000 -c 300 https://www.courtlistener.com:4430/robots.txt
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking www.courtlistener.com (be patient)
Completed 200 requests
Completed 400 requests
Completed 600 requests
Completed 800 requests
Completed 1000 requests
Completed 1200 requests
Completed 1400 requests
Completed 1600 requests
Completed 1800 requests
Completed 2000 requests
Finished 2000 requests

Server Software:        nginx
Server Hostname:        www.courtlistener.com
Server Port:            4430
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        www.courtlistener.com

Document Path:          /robots.txt
Document Length:        316809 bytes

Concurrency Level:      300
Time taken for tests:   8.528 seconds
Complete requests:      2000
Failed requests:        0
Total transferred:      634586000 bytes
HTML transferred:       633618000 bytes
Requests per second:    234.53 [#/sec] (mean)
Time per request:       1279.177 [ms] (mean)
Time per request:       4.264 [ms] (mean, across all concurrent requests)
Transfer rate:          72669.33 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       24  362 440.9    145    3985
Processing:   129  830 388.7    752    6811
Waiting:       21  136 222.2     79    6656
Total:        376 1192 636.9   1013    8415

Percentage of the requests served within a certain time (ms)
  50%   1013
  66%   1240
  75%   1438
  80%   1576
  90%   1944
  95%   2354
  98%   2904
  99%   3475
 100%   8415 (longest request)

Result: 234.53/s. Apache was around 150 for that.

Concurrency of 1,000:

↪ ab -n 5000 -c 1000 https://www.courtlistener.com:4430/robots.txt
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking www.courtlistener.com (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
^C

Server Software:        nginx
Server Hostname:        www.courtlistener.com
Server Port:            4430
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        www.courtlistener.com

Document Path:          /robots.txt
Document Length:        316809 bytes

Concurrency Level:      1000
Time taken for tests:   67.435 seconds
Complete requests:      3484
Failed requests:        3
   (Connect: 0, Receive: 0, Length: 3, Exceptions: 0)
Total transferred:      1188153637 bytes
HTML transferred:       1186098089 bytes
Requests per second:    51.66 [#/sec] (mean)
Time per request:       19355.517 [ms] (mean)
Time per request:       19.356 [ms] (mean, across all concurrent requests)
Transfer rate:          17206.39 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0 1596 2797.0    880   61340
Processing:     6 15015 10667.3  14262   61352
Waiting:        0 1483 2946.5    573   45942
Total:        442 16610 11375.4  15539   66635

51.6/s, compared to about 30 for apache.

This is before cranking the number of workers. I'll do more soon, but this is a very promising first test.

mlissner commented 4 years ago

I was able to get emails to send by adding docker's IP address to the /etc/postfix/main.cf file on line:

mynetworks

That opened it up to docker. I was able to get the correct IP address by trying to send an email and watching the host's logs at /var/log/syslog. This guide has more: http://satishgandham.com/2016/12/sending-email-from-docker-through-postfix-installed-on-the-host/

I don't love it. I'm concerned that the IP address will change, but there is at least one other guide that has the same directions.

mlissner commented 4 years ago

More stuff:

mlissner commented 4 years ago

This is basically done. Here are some notes from deployment, both before and after.


Deployment thoughts:

  1. Stop cron jobs

    • [x] Tweak cron jobs to use cl-python container

    • [x] Stop supervisor daemons and load in new files from master.

    • [x] Webserver:

    • Apache goes down

    • Nginx goes up with correct ports, in daemon mode

    • [x] Celery

    • Old needs to complete, then stop it

    • Pull new code

    • Change to use cache of 4 and CELERY of 5.

    • Launch them again (be sure to launch judge-pics etc at the same time).

    • [x] Start supervisor daemons

    • Bugs in ia_uploader

    • [x] Start cron jobs

Notes

  1. Restart gunicorn with: cd /var/www/cl/docker/nginx/ && sudo docker-compose kill -s HUP cl-python.

  2. dbshell with: cd /var/www/cl/docker/nginx/ && sudo docker-compose exec cl-python python /opt/courtlistener/manage.py dbshell.

  3. To get the service updated, I docker service rm'ed it, and then did:

    sudo CELERY_PREFORK_CONCURRENCY=10     \
        CL_CODE_DIR=/opt/tasks/ \
        PYTHON_PACKAGES=/usr/local/lib/python3.8/site-packages/ \
        POSTGRESQL_SOCK=/var/run/postgresql  \
        DJANGO_MEDIA_ROOT=/sata-old   \
        CELERY_PREFORK_MEMORY=5 \
        CELERY_PREFORK_BULK_CONCURRENCY=10  \
        CELERY_PREFORK_BULK_MEMORY=5 \
    sudo docker stack deploy --compose-file /opt/tasks/docker/task-server/docker-compose.yml task-server
  4. Used for running cron stuff: docker exec cl-python /opt/courtlistener/manage.py scrape_rss

mlissner commented 4 years ago

I ran SSLLabs yesterday and got an A rating. We had a B before because that was the best Apache could get us to without upgrading to a newer version of it (which would have been hard). To get an A+, I think we'll need to go to Mozilla's super secure crypto suite, but that breaks backwards compatibility, so we don't want that. A is good!