StevenWeathers / thunderdome-planning-poker

⚡ Thunderdome is an open source agile planning poker, sprint retro, and story mapping tool
https://thunderdome.dev
Apache License 2.0
396 stars 103 forks source link

Bug: My Games overview has an error #533

Closed Adrian-Grimm closed 3 months ago

Adrian-Grimm commented 3 months ago

List of /games is and stays empty.

ERR poker/client.go:87 > | ts=1711586159.74078 msg=close error version=3.8.3 error=close tcp 172.17.0.6:8080->172.17.0.1:42525: use of closed network connection poker_id=22fc1095-527c-429d-823b-caa99150cfc1 session_user_id=d3ce7a1d-9bfa-422c-b05d-225380e7cc0e stacktrace=github.com/StevenWeathers/thunderdome-planning-poker/internal/http/poker.subscription.readPump.func1
    /home/runner/work/thunderdome-planning-poker/thunderdome-planning-poker/internal/http/poker/client.go:87
github.com/StevenWeathers/thunderdome-planning-poker/internal/http/poker.subscription.readPump
    /home/runner/work/thunderdome-planning-poker/thunderdome-planning-poker/internal/http/poker/client.go:153 
StevenWeathers commented 3 months ago

The error you mentioned is for the Poker game itself not the list of games, do you have any other errors?

Adrian-Grimm commented 3 months ago

with version 3.9 the error message in docker stdout log is gone 👍🏻.

But still no games in my users list. The list for teams instead is working fine...

I see some error messages in JS console:

image
Adrian-Grimm commented 3 months ago

Sorry, the page you are looking for is not found. HTTP 502 from reverse proxy as response.

https://****.com/api/users/d3ce7a1d-9bfa-422c-b05d-225380e7cc0e/battles?limit=10&offset=0

no further logs...

StevenWeathers commented 3 months ago

Is your reverse proxy forwarding /api endpoints correctly?

No further logs makes me suspect the call isn't making it to Thunderdome.

Adrian-Grimm commented 3 months ago

Could be but this would then be api endpoint specific... may an exception in the endpoint? Other API endpoints are working fine e.g. retros, teams..:

image
Adrian-Grimm commented 3 months ago

But I see it's somthing specific to my user's data... Tested with a new user - there it's working.

StevenWeathers commented 3 months ago

I try to be fairly verbose with logs when errors occur, however a silent error could still be happening. Let me know if you figure out what data caused the issue.

Adrian-Grimm commented 3 months ago

With disabled custom error pages on reverse proxy I get only this:

image

not further logs ...

I'll take a look in DB if I could see some strage data or something.

StevenWeathers commented 3 months ago

Is the Thunderdome server restarting at that time? 502 could mean the server crashed during that request which would leave me to believe a possible panic happened.

Adrian-Grimm commented 3 months ago

No it's not crashing I think. Docker container runs normal when I trigger this.

Adrian-Grimm commented 3 months ago

... I'm not familiar with GO code... But I think the breaking code is in GetGamesByUser() in /internal/db/poker /poker.go

In my case the code should get following data:

  1. my user has 50 battles

  2. data of all games should also get loaded ...

  3. Easy test case "/api/users/d3ce7a1d-9bfa-422c-b05d-225380e7cc0e/battles?limit=1&offset=1" --> HTTP-502

  4. DB Query should look like:

    SELECT b.id, b.name, b.voting_locked, COALESCE(b.active_story_id::text, ''), b.point_values_allowed, b.auto_finish_voting,
          b.point_average_rounding, b.created_date, b.updated_date,
          CASE WHEN COUNT(p) = 0 THEN '[]'::json ELSE array_to_json(array_agg(row_to_json(p))) END AS stories,
          CASE WHEN COUNT(bl) = 0 THEN '[]'::json ELSE array_to_json(array_agg(bl.user_id)) END AS facilitators,
          min(COALESCE(t.name, '')) as team_name
        FROM thunderdome.poker b
        LEFT JOIN thunderdome.poker_story p ON b.id = p.poker_id
        LEFT JOIN thunderdome.poker_facilitator bl ON b.id = bl.poker_id
        LEFT JOIN thunderdome.poker_user bw ON b.id = bw.poker_id
        LEFT JOIN thunderdome.team t ON t.id = b.team_id
        LEFT JOIN thunderdome.team_user tu ON tu.team_id = t.id
        WHERE (bw.user_id = 'd3ce7a1d-9bfa-422c-b05d-225380e7cc0e' AND bw.abandoned = false) OR (tu.user_id = 'd3ce7a1d-9bfa-422c-b05d-225380e7cc0e')
        GROUP BY b.id ORDER BY b.created_date DESC
        LIMIT 1 OFFSET 1
  5. Result got respond after 5secs - quite slow DB server XD .. Timeouts?:

id                                  |name|voting_locked|coalesce                            |point_values_allowed                                                                |auto_finish_voting|point_average_rounding|created_date                 |updated_date                 |stories                                                                                                                                                                                                                                                        |facilitators                            |team_name|
------------------------------------+----+-------------+------------------------------------+------------------------------------------------------------------------------------+------------------+----------------------+-----------------------------+-----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+---------+
2b63612b-83c3-4dc2-9551-7ec0192d4252|test|true         |73521815-5154-4159-adea-4ee9ac2e17b6|["0", " 1", " 2", " 3", " 5", " 8", " 13", " 21", " 34", " 55", " ?", " WTF", " ☕️"]|true              |ceil                  |2024-03-28 01:44:31.288 +0100|2024-03-28 01:45:18.197 +0100|[{"id":"73521815-5154-4159-adea-4ee9ac2e17b6","name":"test","points":"","active":false,"poker_id":"2b63612b-83c3-4dc2-9551-7ec0192d4252","votes":[{"vote": " 55", "warriorId": "d3ce7a1d-9bfa-422c-b05d-225380e7cc0e"}],"created_date":"2024-03-28T01:44:58.328|["d3ce7a1d-9bfa-422c-b05d-225380e7cc0e"]|         |
  1. may I find time to debug this further.. but I need to dig into GO first XP..
Adrian-Grimm commented 3 months ago

Success Case:

  1. Test user with just 1 game
  2. /api/users/7c01fde2-08bb-432c-941b-4490b0c4ac47/battles?limit=1&offset=0 --> HTTP200 with data
  3. DB query runs way faster... quite similar data

So may realy a timeout issue?

I try to tune my PostgreSQL DB in the meanwhile.. 🙈😅 ... 5seconds for 50 records is quite bad. But may it would be nice to check if this fit's to your timeout's in the GO code.. I think up to min. 10 sec's should be acceptable before failing...

Adrian-Grimm commented 3 months ago

without the LEFT JOIN thunderdome.team_user tu ON tu.team_id = t.id it's also fast in my case:

SELECT b.id, b.name, b.voting_locked, COALESCE(b.active_story_id::text, ''), b.point_values_allowed, b.auto_finish_voting,
          b.point_average_rounding, b.created_date, b.updated_date,
          CASE WHEN COUNT(p) = 0 THEN '[]'::json ELSE array_to_json(array_agg(row_to_json(p))) END AS stories,
          CASE WHEN COUNT(bl) = 0 THEN '[]'::json ELSE array_to_json(array_agg(bl.user_id)) END AS facilitators,
          min(COALESCE(t.name, '')) as team_name
        FROM thunderdome.poker b
        LEFT JOIN thunderdome.poker_story p ON b.id = p.poker_id
        LEFT JOIN thunderdome.poker_facilitator bl ON b.id = bl.poker_id
        LEFT JOIN thunderdome.poker_user bw ON b.id = bw.poker_id
        LEFT JOIN thunderdome.team t ON t.id = b.team_id

        WHERE (bw.user_id = 'd3ce7a1d-9bfa-422c-b05d-225380e7cc0e' AND bw.abandoned = false)
        GROUP BY b.id ORDER BY b.created_date DESC
        LIMIT 10 OFFSET 1

was executed in 0,012 sec 🧐

even also with the join and tu.user_id = null it's same fast...

StevenWeathers commented 3 months ago

Ok so that the team_user table does not have an index on the team_id column, that could improve performance for a larger table of users.

StevenWeathers commented 3 months ago

Good news is you can increase the http timeouts in the short term, currently the default is 5 seconds.

viper.SetDefault("http.write_timeout", 5)
viper.SetDefault("http.read_timeout", 5)
Adrian-Grimm commented 3 months ago

Could you may expose those timeout values as a DB-Timeout env-var in docker? But also an index on this sounds like a good improvement 👍🏻

StevenWeathers commented 3 months ago

So there is no enforced DB timeout on the application, the http handler timeout is what your usecase is running into. Those are configurable already so I would suggest increasing them from 5 seconds to 10.

As far as index goes, at least testing with over 1000 team users locally (28ms query time) I'm not seeing any noticeable performance gains.

Adrian-Grimm commented 3 months ago

I'm just digging deeper before I start tweaking DB server performance ..:

image

some strange numbers:

image

I've 1031 stories in poker_story table

StevenWeathers commented 3 months ago

Those are some great details to dig into, I'm always open to improving query performance. One trade-off we could make right away is to remove the count of stories pointed vs not pointed in the games listing page.

Adrian-Grimm commented 3 months ago

there is a bottlenec right:

image
Adrian-Grimm commented 3 months ago

I think I definetly need to give my database server more ressources for this DB.

But also this is quite a lot of data just for the List of games of a single user - over 50mb for just 50 games where the paging by default shows only 10 I think... 🧐

StevenWeathers commented 3 months ago

I think there's a few things going on with that query, looking at it I'm confused why this scan is going on

image
StevenWeathers commented 3 months ago

Ok I see what's going on, its the poker_user join thats actually increasing the query scans.

StevenWeathers commented 3 months ago

I'm playing with the Query and have the number of rows way down, still trying to optimize further.

Adrian-Grimm commented 3 months ago

I also had success with improving this DB server instance Helped a lot to increase the shared_buffers and set the work_mem to 60MB. With that in place the list loads - but query is still takes 3 secs in my case.

But now I get this when trying to get on the second page XP

image

.. so I think this requires still more ressources to get into the "green zone":

image

(network connection get's cut off - but data was getting through... I think this is the 5sec timeout you mentioned...)

StevenWeathers commented 3 months ago
image

I think I've got the query to a super optimal state...

StevenWeathers commented 3 months ago

If you could give this a shot replace USER_ID_HERE and let me know your performance.

WITH user_teams AS (
    SELECT t.id, t.name FROM thunderdome.team_user tu
    LEFT JOIN thunderdome.team t ON t.id = tu.team_id
    WHERE tu.user_id = 'USER_ID_HERE'
),
team_games AS (
    SELECT id FROM thunderdome.poker WHERE team_id IN (SELECT id FROM user_teams)
),
user_games AS (
    SELECT u.poker_id AS id FROM thunderdome.poker_user u
    WHERE u.user_id = 'USER_ID_HERE' AND u.abandoned = false
),
games AS (
    SELECT id from user_games UNION ALL SELECT id FROM team_games
),
stories AS (
    SELECT poker_id, points FROM thunderdome.poker_story WHERE poker_id IN (SELECT poker_id FROM games)
)
SELECT p.id, p.name, p.voting_locked, COALESCE(p.active_story_id::text, ''), p.point_values_allowed, p.auto_finish_voting,
          p.point_average_rounding, p.created_date, p.updated_date,
          CASE WHEN COUNT(s) = 0 THEN '[]'::json ELSE array_to_json(array_agg(row_to_json(s))) END AS stories,
          CASE WHEN COUNT(bl) = 0 THEN '[]'::json ELSE array_to_json(array_agg(bl.user_id)) END AS facilitators,
          min(COALESCE(t.name, '')) as team_name
        FROM thunderdome.poker p
        LEFT JOIN stories AS s ON s.poker_id = p.id
        LEFT JOIN (
            SELECT poker_id, user_id FROM thunderdome.poker_facilitator WHERE poker_id IN (SELECT poker_id FROM games)
        ) AS bl ON bl.poker_id = p.id
        LEFT JOIN user_teams t ON t.id = p.team_id
        WHERE p.id IN (SELECT id FROM games)
        GROUP BY p.id ORDER BY p.created_date DESC;
Adrian-Grimm commented 3 months ago

Quite impressive result! 🎉👍🏻

image

looks also great in detail:

image

... this will also safe a lot on scale 😅👍🏻

StevenWeathers commented 3 months ago

@Adrian-Grimm Thank you for bringing this to my attention and aiding in hunting this pesky rabbit down, I will have the fix out shortly!

Adrian-Grimm commented 3 months ago

Thx to you, for the great support and maintaining all of this with so much 💪🏻