PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
20.69k stars 1.24k forks source link

User session recordings are not playing after 1 hour #21391

Closed madridist88 closed 4 months ago

madridist88 commented 5 months ago

Hello!

How to reproduce: I'm using self-hosted. Activated session recording. I don't see any errors in docker logs. Sessions are recorded on my server and I can view them, but only for 1 hour. Then, the ability to view disappears and I see "Buffering..", if I click, then "We're unable to play this recording" appears

Expectations: Session recordings can be viewed for as long as they are stored on the server (7 days by default).

What I tried:

  1. Update to the latest version
  2. Store session recordings on AWS S3. Session recordings are saved in my bucket, however an hour after saving, I see the same "Buffering.."

Tell me how to solve this problem? Снимок экрана 2024-04-08 в 11 30 56

pauldambra commented 5 months ago

Hey @madridist88,

We don't offer detailed support for self-hosted instances because there can be so many differences between people's deployments

When loading a recording for playback. We make several calls to the same snapshots API endpoint

Screenshot 2024-04-10 at 12 18 34

the first call returns the locations that the replay data might be in. Replay data is loaded from both blob storage and realtime storage (depending on its age)

once you have the list of sources, we then load data from those sources. If you're never exiting buffering and not showing an error state I'd expect you'd also see one of those calls returns an empty error... is that true?

madridist88 commented 4 months ago

Hey, pauldambra,

in my case: all calls to "snapshots*" have status "200 OK".

Also, I noticed a couple of errors in the interface (see pictures). This seems to be related to my problem. Please tell me how to fix this?

1 2

pauldambra commented 4 months ago

in my case: all calls to "snapshots*" have status "200 OK".

We'd need to see the response to the snapshot calls


The errors are not common and would suggest your installation isn't completely healthy. For example if I visit the replay page on PostHog cloud I don't see those errors


Please tell me how to fix this?

As I mentioned we don't offer support for self-hosted instances

madridist88 commented 4 months ago

Hi, pauldambra

Снимок экрана 2024-04-19 в 11 49 05
pauldambra commented 4 months ago

hey,

screenshots aren't the most useful way to share browser network requests 😅

i'm going to assume from this that the original call to snapshots returns only realtime as a source, then the calls to realtime are empty

if the recordings play initially, i.e. when they're realtime then this suggests that the recordings aren't being stored in object storage or the posthog web server isn't able to read from the object storage and so only returns realtime as a source

there will be more (possibly) in application logs

this is most likely a config issue in the setup of minio / object storage or the environment variables provided to the application, and we don't offer support for self-hosted

feedanal commented 4 months ago

Hitting this problem as well. While responses for recordings requests in browser are 200, they in fact are empty:

Request URL:
/api/projects/1/session_recordings/018f6577-2dd6-7989-ac01-6f4dcbed8269/snapshots?source=realtime&version=2024-04-30
Request Method:
GET
Status Code:
200 OK
Remote Address:
37.27.67.81:443
Referrer Policy:
same-origin

Allow:
GET, HEAD, OPTIONS
Alt-Svc:
h3=":443"; ma=2592000
Cache-Control:
no-store
Content-Length: 0  <--------- EMPTY
Content-Type:
application/json
Cross-Origin-Opener-Policy:
same-origin
Date:
Sat, 11 May 2024 02:48:07 GMT
Referrer-Policy:
same-origin
Server:
Caddy
Server:
Unit/1.31.1
Vary:
Origin
X-Content-Type-Options:
nosniff
X-Frame-Options:
SAMEORIGIN

Further investigation shows that session recordings are failing to save to Minio (this is default option for self-hosted I assume):

posthog-web-1                     | 2024-05-11T02:41:30.513303Z [error    ] kafka_session_recording_produce_failure [posthog.api.capture] ip=123.123.123.123 pid=884 request_id=f3268cfb-cd79-4c0f-9e0d-4601c88000f3 tid=127571497027904 token=phc_jCb41EhqVnkH2pQyHmlpIb4xZU50ROHDtrBmqd9atC6 x_forwarded_for=90.167.174.171
posthog-plugins-1                 | {"level":"error","time":1715395290940,"pid":144,"hostname":"894dc1fc37c2","errorMessage":"Error: connect ECONNREFUSED 127.0.0.1:19000","error":{"errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":19000,"$metadata":{"attempts":1,"totalRetryDelay":0}},"sessionId":"018f6579-e654-76d7-b440-7d80b0b0afc0","partition":0,"teamId":1,"topic":"session_recording_snapshot_item_events","oldestKafkaTimestamp":null,"bufferCount":0,"reason":"buffer_age","msg":"[MAIN] 🧨 [session-manager] failed writing session recording blob to S3"}
posthog-plugins-1                 | {"level":"error","time":1715395290941,"pid":144,"hostname":"894dc1fc37c2","err":{"type":"Error","message":"connect ECONNREFUSED 127.0.0.1:19000","stack":"Error: connect ECONNREFUSED 127.0.0.1:19000\n    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16)\n    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:128:17)","errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":19000,"$metadata":{"attempts":1,"totalRetryDelay":0}},"session_id":"018f6579-e654-76d7-b440-7d80b0b0afc0","msg":"[MAIN]  session-replay-ingestion - failed trying to flush on idle session: 018f6579-e654-76d7-b440-7d80b0b0afc0"}
posthog-web-1                     | 2024-05-11T02:41:32.089282Z [info     ] request_started                [django_structlog.middlewares.request] ip=123.123.123.123 pid=950 request=<WSGIRequest: GET '/api/projects/1/session_recordings/018f6577-2dd6-7989-ac01-6f4dcbed8269/snapshots?source=realtime&version=2024-04-30'> request_id=b6f71972-c005-4369-81fe-665ac075d5a2 tid=127571497027904 token=phc_jCb41EhqVnkH2pQyHmlpIb4xZU50ROHDtrBmqd9atC6 user_agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 x_forwarded_for=52.74.62.22
posthog-web-1                     | 2024-05-11T02:41:33.710351Z [info     ] request_finished               [django_structlog.middlewares.request] code=200 ip=123.123.123.123 pid=950 request=<WSGIRequest: GET '/api/projects/1/session_recordings/018f6577-2dd6-7989-ac01-6f4dcbed8269/snapshots?source=realtime&version=2024-04-30'> request_id=b6f71972-c005-4369-81fe-665ac075d5a2 team_id=1 tid=127571497027904 token=phc_jCb41EhqVnkH2pQyHmlpIb4xZU50ROHDtrBmqd9atC6 user_id=1 x_forwarded_for=52.74.62.22
posthog-web-1                     | 2024-05-11T02:41:40.282076Z [error    ] kafka_session_recording_produce_failure [posthog.api.capture] ip=123.123.123.123 pid=950 request_id=0d21c317-9626-43e4-8196-ac56a24a6398 tid=127571497027904 token=phc_jCb41EhqVnkH2pQyHmlpIb4xZU50ROHDtrBmqd9atC6 x_forwarded_for=213.87.159.244
posthog-plugins-1                 | {"level":"error","time":1715395301518,"pid":144,"hostname":"894dc1fc37c2","errorMessage":"Error: connect ECONNREFUSED 127.0.0.1:19000","error":{"errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":19000,"$metadata":{"attempts":1,"totalRetryDelay":0}},"sessionId":"018f657f-d8a2-7552-9825-c9d5e9f75c52","partition":0,"teamId":1,"topic":"session_recording_snapshot_item_events","oldestKafkaTimestamp":null,"bufferCount":0,"reason":"buffer_age","msg":"[MAIN] 🧨 [session-manager] failed writing session recording blob to S3"}
posthog-plugins-1                 | {"level":"error","time":1715395301521,"pid":144,"hostname":"894dc1fc37c2","err":{"type":"Error","message":"connect ECONNREFUSED 127.0.0.1:19000","stack":"Error: connect ECONNREFUSED 127.0.0.1:19000\n    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16)\n    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:128:17)","errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":19000,"$metadata":{"attempts":1,"totalRetryDelay":0}},"session_id":"018f657f-d8a2-7552-9825-c9d5e9f75c52","msg":"[MAIN]  session-replay-ingestion - failed trying to flush on idle session: 018f657f-d8a2-7552-9825-c9d5e9f75c52"}
posthog-plugins-1                 | {"level":"error","time":1715395304933,"pid":144,"hostname":"894dc1fc37c2","errorMessage":"Error: connect ECONNREFUSED 127.0.0.1:19000","error":{"errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":19000,"$metadata":{"attempts":1,"totalRetryDelay":0}},"sessionId":"018f6565-ce29-7c4b-821d-236028fc525d","partition":0,"teamId":1,"topic":"session_recording_snapshot_item_events","oldestKafkaTimestamp":null,"bufferCount":0,"reason":"buffer_age","msg":"[MAIN] 🧨 [session-manager] failed writing session recording blob to S3"}
posthog-plugins-1                 | {"level":"error","time":1715395304933,"pid":144,"hostname":"894dc1fc37c2","err":{"type":"Error","message":"connect ECONNREFUSED 127.0.0.1:19000","stack":"Error: connect ECONNREFUSED 127.0.0.1:19000\n    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16)\n    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:128:17)","errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":19000,"$metadata":{"attempts":1,"totalRetryDelay":0}},"session_id":"018f6565-ce29-7c4b-821d-236028fc525d","msg":"[MAIN]  session-replay-ingestion - failed trying to flush on idle session: 018f6565-ce29-7c4b-821d-236028fc525d"}
posthog-plugins-1                 | {"level":"error","time":1715395305233,"pid":144,"hostname":"894dc1fc37c2","errorMessage":"Error: connect ECONNREFUSED 127.0.0.1:19000","error":{"errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":19000,"$metadata":{"attempts":1,"totalRetryDelay":0}},"sessionId":"018f6567-7d0f-75b5-85a5-373079e593cb","partition":0,"teamId":1,"topic":"session_recording_snapshot_item_events","oldestKafkaTimestamp":null,"bufferCount":0,"reason":"buffer_age","msg":"[MAIN] 🧨 [session-manager] failed writing session recording blob to S3"}
posthog-plugins-1                 | {"level":"error","time":1715395305234,"pid":144,"hostname":"894dc1fc37c2","err":{"type":"Error","message":"connect ECONNREFUSED 127.0.0.1:19000","stack":"Error: connect ECONNREFUSED 127.0.0.1:19000\n    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16)\n    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:128:17)","errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":19000,"$metadata":{"attempts":1,"totalRetryDelay":0}},"session_id":"018f6567-7d0f-75b5-85a5-373079e593cb","msg":"[MAIN]  session-replay-ingestion - failed trying to flush on idle session: 018f6567-7d0f-75b5-85a5-373079e593cb"}
posthog-web-1                     | 2024-05-11T02:41:45.409395Z [error    ] kafka_session_recording_produce_failure [posthog.api.capture] ip=123.123.123.123 pid=884 request_id=c2386b7c-7631-47a0-8224-ec0a37d3a6e2 tid=127571497027904 token=phc_jCb41EhqVnkH2pQyHmlpIb4xZU50ROHDtrBmqd9atC6 x_forwarded_for=89.37.175.110
posthog-web-1                     | 2024-05-11T02:41:46.689403Z [error    ] kafka_session_recording_produce_failure [posthog.api.capture] ip=123.123.123.123 pid=1082 request_id=0da90e6b-4eed-475f-9d36-e23c509fce8e tid=127571497027904 token=phc_jCb41EhqVnkH2pQyHmlpIb4xZU50ROHDtrBmqd9atC6 x_forwarded_for=107.127.32.53
posthog-plugins-1                 | {"level":"error","time":1715395307735,"pid":144,"hostname":"894dc1fc37c2","errorMessage":"Error: connect ECONNREFUSED 127.0.0.1:19000","error":{"errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":19000,"$metadata":{"attempts":1,"totalRetryDelay":0}},"sessionId":"018f657b-5afe-7cee-b88f-2a6d7028ef37","partition":0,"teamId":1,"topic":"session_recording_snapshot_item_events","oldestKafkaTimestamp":null,"bufferCount":0,"reason":"buffer_age","msg":"[MAIN] 🧨 [session-manager] failed writing session recording blob to S3"}
posthog-plugins-1                 | {"level":"error","time":1715395307736,"pid":144,"hostname":"894dc1fc37c2","err":{"type":"Error","message":"connect ECONNREFUSED 127.0.0.1:19000","stack":"Error: connect ECONNREFUSED 127.0.0.1:19000\n    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16)\n    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:128:17)","errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":19000,"$metadata":{"attempts":1,"totalRetryDelay":0}},"session_id":"018f657b-5afe-7cee-b88f-2a6d7028ef37","msg":"[MAIN]  session-replay-ingestion - failed trying to flush on idle session: 018f657b-5afe-7cee-b88f-2a6d7028ef37"}

So I guess the problem lies in misconfigured MinIO, yet nothing suspicios in its logs:

➜  posthog git:(master) ✗ docker logs e0f6f5981c1e
Automatically configured API requests per node based on available memory on the system: 1385
Finished loading IAM sub-system (took 0.0s of 0.0s to load data).
Status:         1 Online, 0 Offline.
API: http://172.19.0.6:19000  http://127.0.0.1:19000

Console: http://172.19.0.6:19001 http://127.0.0.1:19001

Documentation: https://docs.min.io

 You are running an older version of MinIO released 2 years ago
 Update: Run `mc admin update`

What's really puzzling is that as this issue states, session recordings worked during first 1-2 hours after install, but without extra intervention they just stopped wotking altogether (but still keep adding in PH interface).

This is default installation, and we experienced this issue several times already while toying with self-hosted PH for different projects, so my guess is that it affects all self-hosted instances.

Also, this may explain buffer has no oldestKafkaTimestamp yet log messages https://github.com/PostHog/posthog/issues/22250