bookwyrm-social / bookwyrm

Social reading and reviewing, decentralized with ActivityPub
http://joinbookwyrm.com/
Other
2.24k stars 263 forks source link

Gunicorn causes signed GET requests (e.g. AUTHORIZED_FETCH) to fail in Production #2844

Open futzle opened 1 year ago

futzle commented 1 year ago

Describe the bug On production branch, dockerless, still getting Unauthorized: /user/bookwyrmuser/inbox on follow from a Mastodon instance.

To Reproduce Steps to reproduce the behavior:

  1. Install production branch on Debian 11, Dockerless instructions.
  2. Create a regular user on bookwyrm. Allow automatic follows without approval.
  3. Try to follow this user from a Mastodon 4.0.2 (Hometown 1.1.1) instance with AUTHORIZED_FETCH turned on.
  4. Error appears in bookwyrm log: Unauthorized: /user/bookwyrmuser/inbox
  5. On Mastodon instance, follow degrades to a request and becomes "Withdraw follow request".

Expected behavior The follow should complete, and it should be visible at both ends.

Instance https://outside.ofa.dog/ https://old.mermaid.town/

Additional context This issue looks just like closed issues #2734, #1964, #2626, which are all supposed to be fixed by #2613. Yet I'm pretty sure that my installation, new on 8 May 2023, includes #2613 and I'm still getting the exact same behaviour.

Users on other instances can follow me on Bookwyrm, even users on other Mastodon instances.

I can't seem to follow any Bookwyrm instances from old.mermaid.town.

I tried two users on Mastodon instance old.mermaid.town and had the same result.

Mastodon server has errors in sidekiq logs:

"{\"@context\":\"https://www.w3.org/ns/activitystreams\",\"id\":\"https://old.mermaid.town/4fb96c8a-614f-478c-b4c4-24ad957527b3\",\"type\":\"Follow\",\"actor\":\"https://old.mermaid.town/users/Kymberly\",\"object\":\"https://outside.ofa.dog/user/futzle\"}",  109376236586682428, "https://outside.ofa.dog/user/futzle/inbox"

Mastodon::UnexpectedResponseError: https://outside.ofa.dog/user/futzle/inbox returned code 401
"{\"@context\":\"https://www.w3.org/ns/activitystreams\",\"id\":\"https://old.mermaid.town/users/futzle#follows/5190/undo\",\"type\":\"Undo\",\"actor\":\"https://old.mermaid.town/users/futzle\",\"object\":{\"id\":\"https://old.mermaid.town/9d8361bd-83ca-4d42-b529-910fb6833575\",\"type\":\"Follow\",\"actor\":\"https://old.mermaid.town/users/futzle\",\"object\":\"https://outside.ofa.dog/user/futzle\"}}", 109283781929862046, "https://outside.ofa.dog/user/futzle/inbox"

Stoplight::Error::RedLight: https://outside.ofa.dog/user/futzle/inbox

I tried to follow my Bookwyrm account from a user on Mastodon 4.1 instance aus.social (which AFAIK has AUTHORIZED_FETCH turned off) and the follow succeeded at both ends.


Desktop (please complete the following information):

hughrun commented 1 year ago

I can confirm you have the bookwyrm.instance.actor user which means #2613 has been applied.

futzle commented 1 year ago

My Mastodon (Hometown) environment:

LOCAL_DOMAIN=old.mermaid.town
SINGLE_USER_MODE=false
SECRET_KEY_BASE=xxxxx
OTP_SECRET=xxxxx
VAPID_PRIVATE_KEY=xxxxx
VAPID_PUBLIC_KEY=xxxxx
DB_HOST=/var/run/postgresql
DB_PORT=5432
DB_NAME=mastodon_production
DB_USER=mastodon
DB_PASS=
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
S3_ENABLED=true
S3_PROTOCOL=https
S3_BUCKET=st0
S3_REGION=xxxxx
S3_HOSTNAME=xxxxx
S3_ENDPOINT=xxxxx
AWS_ACCESS_KEY_ID=xxxxx
AWS_SECRET_ACCESS_KEY=xxxxx
SMTP_SERVER=localhost
SMTP_PORT=25
SMTP_AUTH_METHOD=none
SMTP_OPENSSL_VERIFY_MODE=none
SMTP_FROM_ADDRESS='Old Mermaid Town <triton@old.mermaid.town>'
MAX_TOOT_CHARS=1500
AUTHORIZED_FETCH=true
ES_ENABLED=true
ES_HOST=localhost
ES_PORT=9200

In particular I haven't messed with DISALLOW_UNAUTHENTICATED_API_ACCESS or LIMITED_FEDERATION_MODE.

My Bookwyrm environment:

SECRET_KEY="xxxxx"
DEBUG=false
USE_HTTPS=true
DOMAIN=outside.ofa.dog
EMAIL="editor@outside.ofa.dog"
LANGUAGE_CODE="en-au"
DEFAULT_LANGUAGE="English"
MEDIA_ROOT=images/
PGPORT=5432
POSTGRES_PASSWORD=xxxxx
POSTGRES_USER=bookwyrm
POSTGRES_DB=bookwyrm
POSTGRES_HOST=localhost
MAX_STREAM_LENGTH=200
REDIS_ACTIVITY_HOST=localhost
REDIS_ACTIVITY_PORT=6379
REDIS_ACTIVITY_PASSWORD=
REDIS_BROKER_HOST=localhost
REDIS_BROKER_PORT=6379
REDIS_BROKER_PASSWORD=
FLOWER_PORT=8888
FLOWER_USER=admin
FLOWER_PASSWORD=changeme
EMAIL_HOST=localhost
EMAIL_PORT=25
EMAIL_HOST_USER=""
EMAIL_HOST_PASSWORD=
EMAIL_USE_TLS=false
EMAIL_USE_SSL=false
EMAIL_SENDER_NAME=editor
EMAIL_SENDER_DOMAIN=outside.ofa.dog
SEARCH_TIMEOUT=5
QUERY_TIMEOUT=5
ENABLE_THUMBNAIL_GENERATION=true
USE_S3=false
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
ENABLE_PREVIEW_IMAGES=False
PREVIEW_BG_COLOR=use_dominant_color_light
PREVIEW_TEXT_COLOR=#363636
PREVIEW_IMG_WIDTH=1200
PREVIEW_IMG_HEIGHT=630
PREVIEW_DEFAULT_COVER_COLOR=#002549
OTEL_EXPORTER_OTLP_ENDPOINT=
OTEL_EXPORTER_OTLP_HEADERS=
OTEL_SERVICE_NAME=
HTTP_X_FORWARDED_PROTO=false
TWO_FACTOR_LOGIN_VALIDITY_WINDOW=2
TWO_FACTOR_LOGIN_MAX_SECONDS=60
CSP_ADDITIONAL_HOSTS=
hughrun commented 1 year ago

Thanks @futzle for all this detail. This is a confusing one. I'm trying to think through the possible causes here, and I may need to spin up a test mastodon server on the weekend to try to get to the bottom of it, but here's a list of possibilities I can think of:

  1. The fix in #2613 doesn't actually fix AUTHORIZED_FETCH/"secure mode" errors at all, but somehow we convinced ourselves that it did (unlikely, but we were working on a lot of unrelated federation problems at the time caused by a huge influx of new users, so it's possible);
  2. There is something specific to Hometown + secure mode that we didn't pick up (testing has been done on vanilla Mastodon with secure mode previously)
  3. There is something specific to non-Docker Bookwyrm that is causing this bug.
  4. There is something specific to your particular setup that is causing this bug.

Something that might be helpful to narrow things down is to test whether you can send DMs in both directions to/from non-secure Mastodon or other Bookwyrm instances. If that also throws a 401 then we have a more general problem with signed requests, rather than specifically GET requests signed by the generic server actor.

futzle commented 1 year ago

I caught some HTTP logs on the Mastodon Hometown instance (old.mermaid.town) and on the Bookwyrm instance (outside.ofa.dog).

Hometown instance old.mermaid.town IP: 2400:8907::f03c:93ff:fe78:43a2

Bookwyrm instance outside.ofa.dog IP: 2400:8907::f03c:93ff:fec5:bf6a

Search for https : //outside.ofa.dog/user/bookwyrm_user from old.mermaid.town:

outside.ofa.dog nginx log:

MISS - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:19:57 +0000] "GET /user/bookwyrm_user HTTP/1.1" 200 1537 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.032 0.032
MISS - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:19:57 +0000] "GET /.well-known/webfinger?resource=acct:bookwyrm_user@outside.ofa.dog HTTP/1.1" 200 288 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.016 0.015
MISS - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:19:57 +0000] "GET /user/bookwyrm_user/outbox HTTP/1.1" 200 295 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.000, 0.016 0.017
MISS - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:19:57 +0000] "GET /user/bookwyrm_user/following HTTP/1.1" 200 1320 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.016 0.016
MISS - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:19:57 +0000] "GET /user/bookwyrm_user/followers HTTP/1.1" 200 1320 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.020 0.022

Other logs have no interesting traffic.

Attempt to follow bookwyrm_user from hometown_user at UTC 00:23:25:

outside.ofa.dog nginx log:

- - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:23:25 +0000] "POST /user/bookwyrm_user/inbox HTTP/1.1" 401 0 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 8.273 8.269
- - 2400:8907::f03c:93ff:fe78:43a2 [10/May/2023:00:23:47 +0000] "POST /user/bookwyrm_user/inbox HTTP/1.1" 401 0 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.156 0.156

outside.ofa.dog bookwyrm journal:

May 10 00:23:25 outside gunicorn[106467]: Unauthorized: /user/bookwyrm_user/inbox
May 10 00:23:47 outside gunicorn[106467]: Unauthorized: /user/bookwyrm_user/inbox

old.mermaid.town nginx_log:

2400:8907::f03c:93ff:fec5:bf6a - - [10/May/2023:10:23:25 +1000] "GET /users/hometown_user HTTP/1.1" 401 29 "-" "python-requests/2.28.2 (BookWyrm/0.6.2; +https://outside.ofa.dog/)"
2400:8907::f03c:93ff:fec5:bf6a - - [10/May/2023:10:23:25 +1000] "GET /users/hometown_user HTTP/1.1" 401 72 "-" "python-requests/2.28.2"
2400:8907::f03c:93ff:fec5:bf6a - - [10/May/2023:10:23:47 +1000] "GET /users/hometown_user HTTP/1.1" 401 29 "-" "python-requests/2.28.2 (BookWyrm/0.6.2; +https://outside.ofa.dog/)"
2400:8907::f03c:93ff:fec5:bf6a - - [10/May/2023:10:23:47 +1000] "GET /users/hometown_user HTTP/1.1" 401 72 "-" "python-requests/2.28.2"

old.mermaid.town mastodon journal:

May 10 10:23:25 old bundle[1181]: [894a5c72-fcd6-4915-8a47-ddedc4b8ce3f] method=GET path=/users/hometown_user format=json controller=AccountsController action=show status=401 duration=9.80 view=0.44 db=1.79
May 10 10:23:25 old bundle[1181]: [70f83732-716f-41d1-9538-3e33642a7a25] method=GET path=/users/hometown_user format=json controller=AccountsController action=show status=401 duration=36.49 view=0.74 db=2.44 key=https://outside.ofa.dog/user/bookwyrm.instance.actor#main-key
May 10 10:23:26 old bundle[3339097]: 2023-05-10T00:23:26.140Z pid=3339097 tid=1q3ep5 WARN: {"context":"Job raised exception","job":{"retry":16,"queue":"push","dead":false,"args":["{\"@context\":\"https://www.w3.org/ns/activitystreams\",\"id\":\"https://old.mermaid.town/bb119164-c536-4301-8785-225695ade557\",\"type\":\"Follow\",\"actor\":\"https://old.mermaid.town/users/hometown_user\",\"object\":\"https://outside.ofa.dog/user/bookwyrm_user\"}",110341296691175779,"https://outside.ofa.dog/user/bookwyrm_user/inbox"],"class":"ActivityPub::DeliveryWorker","jid":"77771c5b932916dfc1b9c8e6","created_at":1683678197.5960422,"enqueued_at":1683678197.5962195}}
May 10 10:23:26 old bundle[3339097]: 2023-05-10T00:23:26.141Z pid=3339097 tid=1q3ep5 WARN: Mastodon::UnexpectedResponseError: https://outside.ofa.dog/user/bookwyrm_user/inbox returned code 401
May 10 10:23:47 old bundle[1174]: [ba144ed9-21f3-4daa-bfc7-ad940c1b1dd8] method=GET path=/users/hometown_user format=json controller=AccountsController action=show status=401 duration=8.99 view=0.73 db=1.03
May 10 10:23:47 old bundle[1174]: [6cffd5a3-ee63-464c-a8c0-7e4526e7439e] method=GET path=/users/hometown_user format=json controller=AccountsController action=show status=401 duration=24.88 view=0.50 db=3.28 key=https://outside.ofa.dog/user/bookwyrm.instance.actor#main-key
May 10 10:23:47 old bundle[3339097]: 2023-05-10T00:23:47.404Z pid=3339097 tid=1q47px WARN: {"context":"Job raised exception","job":{"retry":16,"queue":"push","dead":false,"args":["{\"@context\":\"https://www.w3.org/ns/activitystreams\",\"id\":\"https://old.mermaid.town/bb119164-c536-4301-8785-225695ade557\",\"type\":\"Follow\",\"actor\":\"https://old.mermaid.town/users/hometown_user\",\"object\":\"https://outside.ofa.dog/user/bookwyrm_user\"}",110341296691175779,"https://outside.ofa.dog/user/bookwyrm_user/inbox"],"class":"ActivityPub::DeliveryWorker","jid":"77771c5b932916dfc1b9c8e6","created_at":1683678197.5960422,"enqueued_at":1683678227.2260702,"error_message":"https://outside.ofa.dog/user/bookwyrm_user/inbox returned code 401","error_class":"Mastodon::UnexpectedResponseError","failed_at":1683678206.1401398,"retry_count":0}}
May 10 10:23:47 old bundle[3339097]: 2023-05-10T00:23:47.404Z pid=3339097 tid=1q47px WARN: Mastodon::UnexpectedResponseError: https://outside.ofa.dog/user/bookwyrm_user/inbox returned code 401

My wild guess as to the events:

  1. old.mermaid.town sends a POST to https : //outside.ofa.dog/user/bookwyrm_user/inbox, presumably the follow request.
  2. outside.ofa.dog counters with a GET to https : //old.mermaid.town/users/hometown_user which returns 401 because AUTHORIZED_FETCH
  3. outside.ofa.dog immediately follows with another GET to https : //old.mermaid.town/users/hometown_user but with key=https : //outside.ofa.dog/user/bookwyrm.instance.actor#main-key which also returns 401. I don't know a lot about the Fediverse's protocols but this feels like it shouldn't be happening.
  4. Having been rejected twice, outside.ofa.dog returns a 401 to old.mermaid.town.
  5. old.mermaid.town queues a retry of the follow at 00:23:26, which occurs at 00:23:47. The logs repeat (and do so again, but truncated in this post).

Is it possible that Hometown is rejecting the signature, or that it considers the signed context invalid?

futzle commented 1 year ago

I temporarily disabled AUTHORIZED_FETCH on Mastodon Hometown instance old.mermaid.town and the follow went straight through. Now that the follow is connected, both accounts can talk to each other just fine, even after re-enabling AUTHORIZED_FETCH.

Seems that it's just the following workflow that's being interfered with.

hughrun commented 1 year ago

Ahhh I think the problem is /user/bookwyrm_user - this is a nonexistent user. I wonder where that is coming from... 🤔

futzle commented 1 year ago

Ahhh I think the problem is /user/bookwyrm_user - this is a nonexistent user. I wonder where that is coming from... 🤔

Au contraire, they do exist! I created a user bookwyrm_user on outside.ofa.dog, and a user hometown_user on old.mermaid.town. For testing, and to make the logfiles more readable.

hughrun commented 1 year ago

Aha!

is it possible that Hometown is rejecting the signature, or that it considers the signed context invalid?

Yes. Sending a signed request with the instance actor after getting a 401 on an unsigned user request is intended behaviour to deal with authorized fetch. We should not be getting a 401 from this second request, however. That looks to be where we will find the source of the problem.

hughrun commented 1 year ago

I temporarily disabled AUTHORIZED_FETCH on Mastodon Hometown instance old.mermaid.town and the follow went straight through. Now that the follow is connected, both accounts can talk to each other just fine, even after re-enabling AUTHORIZED_FETCH.

Is it possible for you to check whether following now works for other users on old.mermaid.town without having to disable AUTHORIZED_FETCH? I'm curious whether this is a "first use" problem related to the generic instance user.

futzle commented 1 year ago

Is it possible for you to check whether following now works for other users on old.mermaid.town without having to disable AUTHORIZED_FETCH?

OMG it is a "first use" problem. Following from Old Mermaid Town to a different user on Outside of a Dog works perfectly, with AUTHORIZED_FETCH subsequently turned on.

Logs are much simpler now, I guess because the two instances trust each other.

old.mermaid.town nginx log:

2400:8907::f03c:93ff:fec5:bf6a - - [11/May/2023:14:10:18 +1000] "POST /users/hometown_user/inbox HTTP/1.1" 202 5 "-" "python-requests/2.28.2 (BookWyrm/0.6.2; +https://outside.ofa.dog/)"

Note that there is no GET on /users/hometown_user like the above log, which is the bit that was getting a 401 error. Is that because outside.ofa.dog doesn't need to verify this server's existence any more? So the suspected failure path has been bypassed?

old.mermaid.town Mastodon log:

May 11 14:10:18 old bundle[3578943]: [85c4d57c-c262-4ecc-85bf-a493daa0cb11] method=POST path=/users/hometown_user/inbox format=*/* controller=ActivityPub::InboxesController action=create status=202 duration=22.79 view=0.00 db=2.49 key=https://outside.ofa.dog/user/bookwyrm_user#main-key

Same request as the nginx log shows.

outside.ofa.dog nginx log:

- - 2400:8907::f03c:93ff:fe78:43a2 [11/May/2023:04:10:18 +0000] "POST /user/bookwyrm_user/inbox HTTP/1.1" 200 0 "-" "http.rb/5.1.0 (Mastodon/4.0.2+hometown-1.1.1; +https://old.mermaid.town/)" 0.060 0.061

If I attempt to follow from Hometown instance old.mermaid.town a user on a different Bookwyrm server (one which has a bookwyrm.instance.actor user, so has #2613 applied), I get the originally described hanging behaviour.

hughrun commented 1 year ago

Thanks this is very helpful.

I need to think about this more but from what you're saying yes this is something about when a whole new ActivityPub server is encountered, rather than simply a new-to-our-server user.

Note that there is no GET on /users/hometown_user like the above log, which is the bit that was getting a 401 error. Is that because outside.ofa.dog doesn't need to verify this server's existence any more?

The prior GET was to retrieve the user profile information and importantly, their public key. Once old.mermaid.town has that it makes a local copy and knows the user so there is no need to fetch it every time it receives a POST.

hughrun commented 1 year ago

This appears to be something to do with production. Testing with the Docker setup:

The code added in #2613 appears to be the same on both branches, so it's something else. My guess is something to do with the bookwyrm.instance.actor ID url or that user's keys, but I can't work out what.

The only other thing that is different other than the branches, is that I'm using ngrok to maintain real domain names when testing: usually on main that means forwarding port 1333, but when running in production we have an inbuilt nginx I think? In any case, for production branch I have to forward port 80 instead of 1333. I can't think of any reason this matters, but leave it here for completeness.

@mouse-reeve do you have any ideas here?

eldang commented 1 year ago

FWIW, I'm encountering the same issue. From my hometown account https://weirder.earth/web/@eldang I can't follow either of my Bookwyrm accounts https://outside.ofa.dog/user/eldang or https://books.theunseen.city/user/eldang

Because I have been able to follow my account on the flagship instance -- https://bookwyrm.social/user/eldang -- I had assumed that #2613 was working as intended and this had to be a config issue with books.theunseen.city. But when I joined outside.ofa.dog I found I had the same issue. The first use issue probably explains that discrepancy.

I'm afraid I have no insight into how this might be fixed, but if any permutation of my accounts are helpful for testing I'm happy to try stuff out.

WesleyAC commented 1 year ago

@hughrun what is your HTTPS setup like on each of those? that's something that's different on the prod and main branches, prod has certbot by default. I presume ngrok does HTTPS termination in both cases?

WesleyAC commented 1 year ago

I don't really understand how this is all supposed to work, but one question I have is — doesn't the Mastodon server need to get the key for bookwyrm.instance.actor? Why don't I see that request in the Bookwyrm logs?

hughrun commented 1 year ago

Yes that's right. Mastodon should reject the initial GET from the user, then we should send it again as a signed GET from the instance actor: Mastodon should then send a request back to collect the instance actor's key.

This all works correctly on main in Docker but there appears to be something not quite right about Dockerless setups and/or production in certain circumstances.

I think I didn't set it up properly when I was trying to test production to find the problem so that could be a red herring.

@futzle is also having an issue where statuses are published twice: I suspect this is related and we also have an Issue where someone else had the same problem.

OliverPifferi commented 1 year ago

Having the same issue here with a working Mastodon- and BookWyrm-instance. The latter was reverted from an offline snapshot and since then, the user feed disappeared (to be specific, the status updates so far did - new ones are displayed as they should) and Mastodon wasn't able to follow the BookWyrm-user anymore.

I can request a follow (same procedure: Mastodon tells me that I am following, after refreshing and looking at the profile it seems to be "just" a follow request) but the BookWyrm-instance tells me that I've got a new follower - which is somehow odd as the account isn't displayed as being followed from Mastodon.

Using the latest commits here and everything went fine BEFORE the snapshot.

hughrun commented 1 year ago

@OliverPifferi is your Bookwyrm instance using Docker, or the Dockerless setup?

OliverPifferi commented 1 year ago

@OliverPifferi is your Bookwyrm instance using Docker, or the Dockerless setup?

Sorry, forgot to mention that - it's deployed via Dockerless setup.

hughrun commented 1 year ago

I have done further testing on this with latest Bookwyrm, and latest Mastodon+Hometown.

  1. Works correctly with Bookwyrm running main branch on Docker behind ngrok
  2. Does NOT work correctly with Bookwyrm running production branch on Docker
  3. Does NOT work correctly with Bookwyrm running production branch on Dockerless

So problem definitely seems to be with production.

@mouse-reeve is it possible the issue here is something gunicorn does differently to Django's runserver?

mouse-reeve commented 1 year ago

I genuinely have no idea at this point 😬

WesleyAC commented 1 year ago

@hughrun it should be possible to test that by running something like:

docker-compose up -d
docker-compose stop web
docker-compose run web /app/manage.py runserver 0.0.0.0:8000

on the production branch. or changing the command line in docker-compose.yml

hughrun commented 1 year ago

@WesleyAC good thinking. Still getting the same behaviour.

OMG I just wasn't patient enough. IT WORKS!!!

So, this is caused by something in the way gunicorn is handling ...something.

hughrun commented 1 year ago

Thanks @WesleyAC I think I now have a solution for this 🎉

I don't profess to know enough about http requests and threading to understand exactly why this happens, but the answer looks to be "runserver is multi-threaded and gunicorn can be, but isn't by default".

See this Stack Overflow answer to a similar problem.

The answer refers to "workers" but they are actually talking about threads. See Gunicorn documentation on threads.

More on how to mix workers and threads in Gunicorn and how they differ.

I have successfully tested adding an extra thread in the production docker-compose.yml:

web:
  build: .
  env_file: .env
  command: gunicorn bookwyrm.wsgi:application --threads=2 --bind 0.0.0.0:8000

This fixes the problem reported here by @futzle.

If anything, adding one or more extra threads seems like it should result in improved performance in most instances, but I'd like more opinions from perf champ @WesleyAC and Fearless Leader @mouse-reeve before I log a simple PR for this, in case there are other ramifications.

futzle commented 1 year ago

!!!

I've edited /etc/systemd/system/bookwyrm.service (dockerless install) and added --threads=2 to the [Service] section:

ExecStart=/opt/bookwyrm/venv/bin/gunicorn bookwyrm.wsgi:application --threads=2 --bind 0.0.0.0:8000

Let's see how things go after that change.

Update: So it looks like this change has helped. The view from a Mastodon server now is:

  1. Mastodon user presses Follow. UI confirms that the Bookwyrm user is being followed.
  2. Mastodon user refreshes the page and sees that the follow has demoted to a follow request.
  3. Mastodon user waits 1-2 minutes and refreshes the page. The follow request has promoted back to a full follow again.

This dance seems to apply only when the first user on any given instance attempts to follow. Afterwards follow requests seem to be instant. (Needs further verification!)

I have seen the same behaviour when following a Takahē instance from a Mastodon server with AUTHORIZED_FETCH turn on, so I wonder if this is just normal for Mastodon.

WesleyAC commented 1 year ago

fascinating! it's really strange that this has the effect it does.

on bookwyrm.social we're using gunicorn bookwyrm.wsgi:application --workers 12 --threads 10 --preload --max-requests 100 --max-requests-jitter 200 --bind 0.0.0.0:8000 right now, so making it default seems fine, although it does seem kinda like it's papering over what must be a real bug somewhere…

WesleyAC commented 1 year ago

Ah, reading the stack overflow post you linked the deadlock makes sense, and the real fix is #2717, presumably the bookwyrm/views/follow.py ones.

hughrun commented 1 year ago

This is also making me wonder if we should always send outgoing GET requests to webfinger and external ids as signed requests from bookwyrm.instance.user. That would remove a whole unnecessary request loop from this process. Increasingly signed GET requests are becoming more common in the fediverse.

futzle commented 1 year ago

Agree with everyone that it looks like there's something being unexpectedly re-entrant, and having strictly one thread is leading to something timing out because there's no thread left to handle the inner request. #2717 looks like a place to spend some effort.

After a day, with the extra --threads on Outside of a Dog, everything is still behaving. This seems to be the smallest fix sufficient to fix my reported issue.

quisquous commented 1 year ago

For what it's worth, I had msub2 add this on books.theunseen.city (on 0.6.4) and I am still having the same issue of being unable to follow my account (weirder.earth -> books.theunseen.city, one example the same as eldang above).

futzle commented 9 months ago

A user on Mastodon 4.2.3 instance eigenmagic.net is encountering this issue following a user on Bookwyrm 0.6.3 instance outside.ofa.dog, even with the --threads patch. Still seems there's an issue.

Edit: I don't have access to the eigenmagic.net logs but the logs on outside.ofa.dog are just the same as above.