glencoesoftware / omero-ms-core

OMERO Vert.x microservice core
GNU General Public License v2.0
2 stars 6 forks source link

Jython memory usage and possible interpreter churn #18

Closed chris-allan closed 3 years ago

chris-allan commented 4 years ago

Recently I have been looking at image region microservice heap dumps from a few customers where the single largest consumer were Python classes. This really surprised me since our use is so limited; basically just unpickling an object to get the OMERO session key and some limited metadata. The objects also go out of scope nearly immediately. Even after forcing a garbage collector cycle they didn't disappear from the heap.

This lead me to investigate cases of PySystemState issues documented in the wild and found this one:

Further investigation lead me to realize that much of the Jython infrastructure is per-classloader and interactions with Vert.x were likely causing a lot of junk to remain around indefinitely. The aforementioned issue is fixed in Jython 2.7.1 (we're depending on 2.7.0) so we could upgrade to that and see what happens. However, fundamentally we just don't need the entire runtime to be available as all we want to do is retrieve some fields from the pickled object.

I investigated this a little further to find that the Kaitai Struct project has a serialization format for Python pickle serialization that should allow us to get the metadata we need. No interpreter and no Jython dependency on our CLASSPATH. For reference:

I'd propose we:

  1. Get a few byte perfect dumps of Redis keys for OMERO.web sessions
  2. Implement a concrete class of IConnector that deserializes via Kaitai Struct
  3. Set up some unit tests to ensure we can get access to all the data correctly
  4. Remove reliance on Jython entirely
chris-allan commented 3 years ago

Resolved by #19.