ebmdatalab / openpath-dash

Experimental Dash version of openpathology browser
0 stars 1 forks source link

Performance issues #159

Closed sebbacon closed 4 years ago

sebbacon commented 4 years ago

It recently (since 2020?) started taking > 12s to load any analyse result. It used to take < 4s.

Inspecting the dokku server shows that the process is CPU-bound.

In a dev sandbox, it's slow on the first load, but then fast once caching kicks in, which suggests a caching problem.

The cache is served from /tmp/, but in the currently running docker container, the only hashed filename there was last updated a couple of weeks ago, and its size indicates it is not the main data cache:

-rw------- 1 herokuishuser herokuishuser    18 Jan 15 15:46 2029240f6d1128be89ddc32729463129
sebbacon commented 4 years ago

Following a deploy (including the commit referenced above) it works. Mysterious.

root@2515bb7edc2a:/tmp/dash_cache# ls -l
total 112108
-rw------- 1 herokuishuser herokuishuser       27 Jan 29 13:19 09378a67830610320f79781f177e58b9
-rw------- 1 herokuishuser herokuishuser       27 Jan 29 13:13 1026330477017a7a89957fbaa4219d8d
-rw------- 1 herokuishuser herokuishuser       17 Jan 29 13:19 2029240f6d1128be89ddc32729463129
-rw------- 1 herokuishuser herokuishuser   480934 Jan 29 13:19 3ee1471d434a18c5bcb06edd943eaa1b
-rw------- 1 herokuishuser herokuishuser 56759318 Jan 29 13:14 42358545bbbe31279d0982cd3dba7df2
-rw------- 1 herokuishuser herokuishuser       82 Jan 29 13:14 61c559924d6ad56d07e2a9477b2dbbfa
-rw------- 1 herokuishuser herokuishuser      130 Jan 29 13:14 6da59175d5aa7fbf315492c05626fef3
-rw------- 1 herokuishuser herokuishuser       27 Jan 29 13:19 7be967d052e8b238b5fe5c53687354ea
-rw------- 1 herokuishuser herokuishuser       27 Jan 29 13:19 7d6da0d818367f1a99fdd99c26e85a02
-rw------- 1 herokuishuser herokuishuser     2723 Jan 29 13:19 862dd1a294a0fae6abf85031847a40f9
-rw------- 1 herokuishuser herokuishuser   747448 Jan 29 13:19 909e9fcde4b1c0ffa1b39809cfb352dc
-rw------- 1 herokuishuser herokuishuser       27 Jan 29 13:14 a018d6ed920ae346757cb1ae32ddffb5
-rw------- 1 herokuishuser herokuishuser 56759318 Jan 29 13:19 b011c3c1b058f5858ce017b0fed995ef
-rw------- 1 herokuishuser herokuishuser       27 Jan 29 13:19 cc7198987b884a9304b193fad76f6ea0
evansd commented 4 years ago

I think the issue is that lack of caching really hurts this application because it's designed around the principle that memoized functions are cheap to call repeatedly. For instance, every time we get the list of CCGs, or the mapping from entity IDs to human readable names, or the list sizes of practices we end up calling the get_data function. If the cache is working then this is fine, but if it isn't then we re-read the entire dataset off disk every time.

I think we might be best off using the simple backend which is stores everything in local memory and thus avoids constantly reading it back off disk (though this no doubt hits the page cache). This means that each process gets its own copy of the cache but I don't think that matters. And it avoids any issues with stale cache data.

In fact, looking at the simple backend code it pickles the values which means that we're constantly unpickling that big dataframe which will add to the slowness. I think it would be pretty easy to make our own backend based off it that just stuffed the values into a dictionary as is.

Thoughts?

sebbacon commented 4 years ago

Makes sense, but as with all caching, I guess we'll only know if it makes things better by actually trying it... as you say, what with page cache etc it is unlikely to have a performance benefit without looking into a non-pickled representation. I find it hard to imagine unpickling 50Mb is particularly expensive - perhaps it's worth testing that quickly?

The full-dataframe caches are about 50Mb each on disk so I don't suppose it'd be a problem having a per-process cache.

evansd commented 4 years ago

I think I'd prefer in-memory caching more for the simplicity than for the performance. We definitely need some caching or performance falls off a cliff as we've seen. And we've already had the filesystem one break for reasons we don't fully understand. So I think I'd be inclined to just change the backend to simple and leave it at that.

sebbacon commented 4 years ago

Agreed