Podcastindex-org / web-ui

The public home page of podcastindex.org
MIT License
55 stars 123 forks source link

Bug: Comments not loading #244

Closed dellagustin closed 1 year ago

dellagustin commented 1 year ago

This is likely a regression of https://github.com/Podcastindex-org/web-ui/pull/236.

VM32:1 Uncaught (in promise) SyntaxError: Unexpected non-whitespace character after JSON at position 1574 at JSON.parse () at n (bundle.js:6:867129)

image

dellagustin commented 1 year ago

@daveajones you can assign this one to me.

dellagustin commented 1 year ago

I have a pretty good idea of what this is, but I need to do some research, in the meantime, here are some references:

dellagustin commented 1 year ago

So, the current implementation assumes that each res.write on the backend will correspond to exactly one response.body.getReader().read() promise resolution on the front-end. I could not find anything on the documentation related to the backend and frontend with respect to chunked transfer encoding.

From https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams:

Now you've got your reader attached, you can read data chunks out of the stream using the ReadableStreamDefaultReader.read() method. This reads one chunk out of the stream, which you can then do anything you like with.

So, yeah, it should be one chunk, so I suspected something else was at play here, then did some verbose curls to production and my local development for comparison.

Production:

❯ curl https://podcastindex.org/api/comments/byepisodeid?id=13886063569 --verbose
*   Trying 2606:4700:20::ac43:4b8f...
* TCP_NODELAY set
* Connected to podcastindex.org (2606:4700:20::ac43:4b8f) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: C:/Program Files/Git/mingw64/ssl/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=Cloudflare, Inc.; CN=sni.cloudflaressl.com
*  start date: Jun  6 00:00:00 2022 GMT
*  expire date: Jun  5 23:59:59 2023 GMT
*  subjectAltName: host "podcastindex.org" matched cert's "podcastindex.org"
*  issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x27948d0)
> GET /api/comments/byepisodeid?id=13886063569 HTTP/2
> Host: podcastindex.org
> User-Agent: curl/7.63.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
< HTTP/2 200
< date: Thu, 16 Mar 2023 22:43:44 GMT
< x-powered-by: Express
< strict-transport-security: max-age=31536000; includeSubDomains
"https://podcastindex.social/users/ChadF"},"commentAsof":"2023-03-16T22:43:43.529Z","replies":[],"repliesAsof":"2023-03-16T22:43:43.529Z"}},"commenters":{"https://podcastindex.social/users/ChadF":{"icon":{"url":"https://cdn.masto.host/podcastindexsocial/accounts/avatars/106/557/219/416/316/803/original/c650f8421a9d4807.png","mediaType":"image/png"},"name":"Sir Chad Farrow :pci: 🎧✂️","url":"https://podcastindex.social/@ChadF","fqUsername":"@ChadF@podcastindex.social","asof":"2023-03-16T22:43:43.529Z"}}}* Connection #0 to host podcastindex.org left intact

Local:

❯ curl http://localhost:5001/api/comments/byepisodeid?id=13886063569 --verbose
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 5001 (#0)
> GET /api/comments/byepisodeid?id=13886063569 HTTP/1.1
> Host: localhost:5001
> User-Agent: curl/7.63.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Date: Thu, 16 Mar 2023 22:44:49 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
< Transfer-Encoding: chunked
<
{"roots":["https://noagendasocial.com/users/adam/statuses/109968324221823701"],"nodes":{"https://noagendasocial.com/users/adam/statuses/109968324221823701":{"comment":{"url":"https://noagendasocial.com/@adam/109968324221823701","published":"2023-03-05T02:31:02Z","attachments":[{"mediaType":"image/jpeg","width":816,"height":771,"url":"https://static.noagendasocial.com/media_attachments/files/109/968/324/169/040/998/original/b56f6f27418b0dc3.jpeg"}],"content":{"en":"<p>Podcasting 2.0 Episode 124: &quot;Undeplatformable&quot; -- Adam &amp; Dave discuss the week&#39;s developments on podcastindex.org - A day later due to power and bandwidth, but with the same spunk as always! <a href=\"http://l.curry.com/fDx\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><span class=\"invisible\">http://</span><span class=\"\">l.curry.com/fDx</span><span class=\"invisible\"></span></a></p>"},"attributedTo":"https://noagendasocial.com/users/adam"},"commentAsof":"2023-03-16T22:44:49.295Z","replies":["https://podcastindex.social/users/ChadF/statuses/110015844766813195"],"repliesAsof":"2023-03-16T22:44:49.295Z"}},"commenters":{"https://noagendasocial.com/users/adam":{"icon":{"url":"https://static.noagendasocial.com/accounts/avatars/000/000/001/original/2433237d38d194bc.jpg","mediaType":"image/jpeg"},"name":"Adam Curry [Texas]","url":"https://noagendasocial.com/@adam","fqUsername":"@adam@noagendasocial.com","asof":"2023-03-16T22:44:49.295Z"}}}{"roots":[],"nodes":{"https://podcastindex.social/users/ChadF/statuses/110015844766813195":{"comment":{"url":"https://podcastindex.social/@ChadF/110015844766813195","published":"2023-03-13T11:56:08Z","attachments":[],"content":{"en":"<p><span class=\"h-card\"><a href=\"https://noagendasocial.com/@adam\" class=\"u-url mention\">@<span>adam</span></a></span> </p><p>Value 4 Value Report EP 124</p><p>LIT Boost = 206,778 sats</p><p>EP 123 Boost = 810,430 sats</p><p>One time Donations = $0</p><p>Monthly Donations = $276.55</p>"},"attributedTo":"https://podcastindex.social/users/ChadF"},"commentAsof":"2023-03-16T22:44:49.295Z","replies":[],"repliesAsof":"2023-03-16T22:44:49.295Z"}},"commenters":{"https://podcastindex.social/users/ChadF":{"icon":{"url":"https://cdn.masto.host/podcastindexsocial/accounts/avatars/106/557/219/416/316/803/original/c650f8421a9d4807.png","mediaType":"image/png"},"name":"Sir Chad Farrow :pci: 🎧✂️","url":"https://podcastindex.social/@ChadF","fqUsername":"@ChadF@podcastindex.social","asof":"2023-03-16T22:44:49.295Z"}}}* Connection #0 to host localhost left intact

Noted something? Production is using HTTP/2, which does not handle chunked data as in HTTP 1.1 (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding#directives):

Note: HTTP/2 disallows all uses of the Transfer-Encoding header other than the HTTP/2 specific: "trailers". HTTP 2 provides its own more efficient mechanisms for data streaming than chunked transfer and forbids the use of the header. Usage of the header in HTTP/2 may likely result in a specific protocol error as HTTP/2 Protocol prohibits the use.

I'm assuming that Cloudflare is doing http2 by default and converting it at its reverse proxy. Further investigation is needed here.

@daveajones , any details from the prod setup that you thing by be influencing here, besides the observations I did above?

ericpp commented 1 year ago

I noticed this too while poking around on my dev copy. My workaround was to add newlines to each chunk (e.g. ndjson) and then split out the newlines on the client side.

ericpp commented 1 year ago

@dellagustin I was able to recreate it by removing e.part === 'replies' from this statement: https://github.com/Podcastindex-org/web-ui/blob/e4ff8cf279053407d8b11e7aa0b798a4a04c8368/server/index.js#L146

Here is my workaround for the issue: https://github.com/ericpp/podcastindex-web-ui/commit/eaa8c87fca40cc95fb717f7c3b88b77797ef6ece

dellagustin commented 1 year ago

Hello @ericpp , thanks for jumping in.

I'm curious to understand the reason you removed e.part === 'replies'.

The solution I was considering is similar to yours, but we cannot assume that all parts of the received data frame (using the HTTP/2 term that is kind of equivalent to chunks, according to what I read), will contain a full JSON, take a look at the following testing screenshot:

image

You can see here 3 complete chunks and the last one that is incomplete.

We need to store the last incomplete part and wait for the next data frame.

Nevertheless I'm reasonably certain that this is impairing the intended performance improvement. I'm guessing, based on the test results, that Cloudflare's reverse proxy buffers the chunks in packages of a certain length and send the buffered data as a data frame once it reaches that size, but truncated by that size. The string in the screenshot above has 3978 chars, I'm guessing somewhere this package size is the round number of 4096 bytes.

We can workaround that, but if we could somehow ensure a 1:1 chunk to data frame ratio, that would likely give us the best performance.

I'll try to fix it this weekend. Testing in prod-like conditions is one of the challenges.

daveajones commented 1 year ago

Production is client to CF (http/2) then CF to origin (nginx reverse proxy - http/2). I can disable http/2 in nginx and see if it’s the origin buffering that is the problem?

dellagustin commented 1 year ago

Hi @daveajones, when I mentioned Cloudflare I was not aware there was nginx in the mix as another "moving part". That test could help. Also, could you share your nginx configuration so that I can check if I can reproduce it locally by putting an nginx instance in front of the app?

ericpp commented 1 year ago

@dellagustin I think I removed the replies part just to see how the code was working and to figure out more ways to speed up the comments loading.

daveajones commented 1 year ago

Ok, here is the nginx config:

server {
    if ($host = www.podcastindex.org) {
        return 302 https://podcastindex.org$request_uri;
    }

    root /var/www/html;
    index index.html;
    server_name www.podcastindex.org podcastindex.org;
    location / {
        proxy_set_header   X-Forwarded-For $remote_addr;
        proxy_set_header   Host $http_host;
        proxy_pass         http://127.0.0.1:8000;       # First attempt to serve request as file, then
    }
    listen 80 default_server;
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
}
dellagustin commented 1 year ago

Thanks you @daveajones . Seems quite straight forward. I guess that the HTTP/2 issue that is going on is the default behavior then. I'll try a minimal nginx setup locally and see if it happens as well. If I can reproduce it locally, I can send a PR that solves it from the back end, even if that is not optimal. It will make the back end more robust and we can see how to optimize further after that, if needed.

ericpp commented 1 year ago

@dellagustin I'm able to recreate the problem through Nginx/Node in Docker: https://github.com/ericpp/podcastindex-web-ui/commit/314519a8a8ee5bbe92b1188923abb9e30b39bbd0

It seems like Nginx is buffering and sending the response in roughly 4096 byte chunks: chunks.txt

ericpp commented 1 year ago

@dellagustin Looks like passing the X-Accel-Buffering: no header from Node to Nginx might turn off buffering: https://stackoverflow.com/a/25017347 https://www.nginx.com/resources/wiki/start/topics/examples/x-accel/#x-accel-buffering

dellagustin commented 1 year ago

I tested @ericpp 's solution and it works, I have created a PR for that. We still should add a clear chunk delimiter to avoid other unknown edge cases.