Murali-group / GraphSpace

The interactive graph sharing website.
http://graphspace.org
GNU General Public License v2.0
30 stars 41 forks source link

Speedup My, Shared, and Public Graphs Query e#331 #448

Closed jddanna closed 3 years ago

jddanna commented 3 years ago

Purpose

Original PR: #329 #330 (#329 includes public & my graph speedup code while #333 includes all the code including Shared Graphs speedup code) Fixes issue (describes in more detail): #328 and #331

Approach

This pull request makes looking up My Graphs Public Graphs a lot faster by moving owner_email, is_public, updated_at to Elasticsearch from postgres. By doing so, this PR eliminates the need to hit postgres at all, and retrieves all the paginated graph information directly from Elasticsearch.

This pull request also handles updating elasticsearch when someone adds, updates or removes a graph.

This branch accomplishes speeding up the Shared Graphs query to be faster

Note that this branch carefully handles the following 6 cases:

Elasticsearch settings change to accommodate large scale migration

In order to index large quantities of graph data in Elasticsearch we had to increase the total field setting from 1,000. This can be checked via http://localhost:9200/graphs/_settings

Open Questions and Pre-Merge TODOs

Learning

Speedup Benchmarks

Benchmarks performed on entire database as of 4/11/21 for specific elasticsearch queries using network times. Similar results can be seen for other searches. There is a larger speedup for the old slower queries.

Search Parameter Old Run Time New Run Time Speedup
p53 16.54 0.19 98.85 %
path 18.90 0.22 98.84 %
kegg 4.22 0.23 84.55 %

Resources that Helped

Other codebase with the similar issue: https://github.com/archivematica/Issues/issues/608