Original PR: #329 #330 (#329 includes public & my graph speedup code while #333 includes all the code including Shared Graphs speedup code)
Fixes issue (describes in more detail): #328 and #331
Approach
This pull request makes looking up My GraphsPublic Graphs a lot faster by moving owner_email, is_public, updated_at to Elasticsearch from postgres. By doing so, this PR eliminates the need to hit postgres at all, and retrieves all the paginated graph information directly from Elasticsearch.
This pull request also handles updating elasticsearch when someone adds, updates or removes a graph.
This branch accomplishes speeding up the Shared Graphs query to be faster
We no longer need to retrieve information from postgres when getting shared graphs.
Every graph in elasticsearch now has a list of users that this graph is shared with
This list is very carefully updated whenever share access changes in any form
Note that this branch carefully handles the following 6 cases:
User shares Graph with a group
User unshares Graph with a group
User added to a group
User removed from a group
Group is deleted
Group is added
Elasticsearch settings change to accommodate large scale migration
In order to index large quantities of graph data in Elasticsearch we had to increase the total field setting from 1,000. This can be checked via http://localhost:9200/graphs/_settings
Open Questions and Pre-Merge TODOs
When the change is pulled, run alembic current to make sure head version matches the downgrade version of the new migration script- bb9a45e2ee5e.
alembic upgrade head will run the upgrade method of the migration script and a successful migration will put the head at '80a910b918d6'
Learning
Alembic versions
Elasticsearch limit of total fields [1000]
Speedup Benchmarks
Benchmarks performed on entire database as of 4/11/21 for specific elasticsearch queries using network times.
Similar results can be seen for other searches. There is a larger speedup for the old slower queries.
Purpose
Original PR: #329 #330 (#329 includes public & my graph speedup code while #333 includes all the code including Shared Graphs speedup code) Fixes issue (describes in more detail): #328 and #331
Approach
This pull request makes looking up My Graphs Public Graphs a lot faster by moving owner_email, is_public, updated_at to Elasticsearch from postgres. By doing so, this PR eliminates the need to hit postgres at all, and retrieves all the paginated graph information directly from Elasticsearch.
This pull request also handles updating elasticsearch when someone adds, updates or removes a graph.
This branch accomplishes speeding up the Shared Graphs query to be faster
Note that this branch carefully handles the following 6 cases:
Elasticsearch settings change to accommodate large scale migration
In order to index large quantities of graph data in Elasticsearch we had to increase the total field setting from 1,000. This can be checked via
http://localhost:9200/graphs/_settings
Open Questions and Pre-Merge TODOs
alembic current
to make surehead
version matches the downgrade version of the new migration script-bb9a45e2ee5e
.alembic upgrade head
will run the upgrade method of the migration script and a successful migration will put the head at '80a910b918d6'Learning
Speedup Benchmarks
Benchmarks performed on entire database as of 4/11/21 for specific elasticsearch queries using network times. Similar results can be seen for other searches. There is a larger speedup for the old slower queries.
Resources that Helped
Other codebase with the similar issue: https://github.com/archivematica/Issues/issues/608