Closed vegetabill closed 3 years ago
Hi there, I'd like to tackle this issue.
Thanks @Hausekey ! Please go through our contributing.md, then you can begin.
Hi Alina and Bill,
I've followed through the contributing.md, and have been familiarizing myself with this project; I had a question about how to test the code with Postgres? I believe I add the indexes code to orm.js, and I'd like to run some EXPLAIN queries, and report my findings. I see a dev-db.sqlite database, but haven't found any postgres ones. It's my first time working with Postgres 😅 In the meantime, I can play around with the sqlite database.
@Hausekey if you set DB_URL=postgres://localhost/your-db-name
in the .env
file or as an environment variable the app should connect to postgres and then you can proceed. HTH!
Thanks @vegetabill , I got it to work! I found out how to generate dummy data for both users and user_answers, so I can run some EXPLAIN and do some benchmarks.
When you wrote "so it likely needs an ordered index on created_at along with user id and question number", were you thinking of creating multi-column indexes or would single-column suffice?
Yes @Hausekey I was thinking of a multi-column index. Intuitively, you'd think the DB would just combined indexes but IIRC it needs one for all of the WHERE or ORDER BY columns in order to make sure it uses an index fully. Obviously, our data isn't so big but it's more so we get snappy performance on the free DB tier. I think each important, commonly run query needs its own index in our case since the where clauses are different.
I think Sequelize should facilitate multi-column indexes but if you have any questions, lmk!
I see, to get the best performance on obtaining most recent progress, multi-column indexes are the way to go.
Below are some findings from the Postgres database that I wanted to report. I currently have two indexes on user_answers: created_at,user_id,question_number and created_at,user_id
To ensure that the indexes will be be used, what I've done was generate about 116 random users, and 198,000 rows of user answers, varying in different times and question numbers, thanks to the open source generatedata script. I picked out a random user_id and question number (e.g scudrural and 14)
EXPLAIN ANALYZE SELECT * FROM "user_answers" WHERE
user_id = 'scudrural' AND question_number='14' ORDER BY created_at DESC limit 3;
Output:
Limit (cost=0.42..371.52 rows=3 width=34) (actual time=0.072..0.131 rows=3 loops=1) -> Index Scan Backward using user_answers_created_at_user_id_question_number on user_answers (cost=0.42..6803.88 rows=55 width=34) (actual time=0.071..0.129 rows=3 loops=1) " Index Cond: (((user_id)::text = 'scudrural'::text) AND (question_number = 14))" Planning Time: 0.095 ms Execution Time: 0.144 ms
EXPLAIN ANALYZE SELECT * FROM "user_answers" WHERE
user_id = 'scudrural' ORDER BY created_at DESC limit 1;
Output:
-> Index Scan Backward using user_answers_created_at_user_id on user_answers (cost=0.42..10396.70 rows=1713 width=34) (actual time=0.029..0.029 rows=1 loops=1) Limit (cost=0.42..6.49 rows=1 width=34) (actual time=0.029..0.030 rows=1 loops=1) " Index Cond: ((user_id)::text = 'scudrural'::text)" Planning Time: 0.128 ms Execution Time: 0.047 ms
For comparison, output without the indexes, rerunning the 2nd query:
EXPLAIN ANALYZE SELECT * FROM "user_answers" WHERE
user_id = 'scudrural' ORDER BY created_at DESC limit 1;
Execution Time: 27.244 ms Planning Time: 0.072 ms Rows Removed by Filter: 198291 " Filter: ((user_id)::text = 'scudrural'::text)" -> Seq Scan on user_answers (cost=0.00..4496.64 rows=1713 width=34) (actual time=0.029..26.625 rows=1601 loops=1) Sort Method: top-N heapsort Memory: 25kB Sort Key: created_at DESC -> Sort (cost=4505.20..4509.49 rows=1713 width=34) (actual time=27.226..27.227 rows=1 loops=1) Limit (cost=4505.20..4505.20 rows=1 width=34) (actual time=27.227..27.229 rows=1 loops=1)
I'm still learning how to read the output of the EXPLAIN ANALYZE query, but it looks like it's using the indexes! Please let me know if there are questions or comments. When it looks good, I'll go ahead and make a pull request.
That looks great @Hausekey !
Please go ahead and make a PR when you get a chance.
This mostly only applies to postgres since we don't really care about this for local dev using sqlite.
For two commonly run queries, we probably want to add indexes to the DB to make them more performant:
/user/answers/question/:questionNumber
looks up the most recent 3 answers by user and question number, so it likely needs an ordered index on created_at along with user id and question number/user/progress
will need to know the most recent answer, regardless of question number which would be similar to the above index but without question numberUse EXPLAIN to make sure those queries are optimized and will use those indexes.
Sequelize indexes doc