[ML] Storing TorchScript models

davidkyle commented 3 years ago

TorchScript models are compressed binary files where the size is dependent on purpose but for neural networks models a size larger than 1GB is not unusual. Storing such a large model as a single document in Elasticsearch would be sub-optimal as it requires contiguous memory. Additionally uploading a document larger than the HTTP max content length limit is infeasible.

Better performance will be found by splitting the model into chunks and streaming those chunks to the native process to be reassembled on use. ml already uses this pattern for Anomaly Detection job state. AD uses 16MB chunks but it is worth benchmarking smaller chunk sizes. Because the model is binary data it must be base64 encoded then it can be stored in a Binary field or an index with mappings disabled.

Once the model is split into constituent chunks a meta-document will track the number of chunks, the IDs of the documents containing model chunks will follow a predictable naming convention. One vital piece of information required by the native process is the size of the model in bytes. This must be sent before any chunks are streamed, this value should be stored in the meta document as reading all the chunks into a buffer to calculate the un-base64 encoded size defeats the purpose of streaming.

TODO

[x] Java strings are in UTF-16 format, storing base64 encoded data in UTF-16 strings is needlessly wasteful as we know the high byte will never be used. Parsing and decoding the strings further inflates the memory requirements. Is it possible to store the model in byte []?

elasticmachine commented 3 years ago

Pinging @elastic/ml-core (Team:ML)

davidkyle commented 3 years ago

Implementation Considerations

Code review of the first implementation in #71323 has raised a number of questions, many relating to index management

The maximum recommended shard size is 50GB, with enough models this value would be exceeded. Should we use size based rollover on the index where models are stored?
If rollover is used can we ensure that a model is not spread across multiple indices as that would complicate model management?
The above implies some sort of hand rolled ILM. Is it easier to support model definitions in multiple indices?

Where should the models be stored?

1. User designated Index?

Perhaps the most flexible solution from an end users perspective and would allow direct upload to the index as there are no of the access restrictions that come with system indices. For obvious performance reasons the binary blobs the models consist of should not be analysed or indexed for search, since ml does not own the index the burden of defining the correct index mappings falls on the user who must also be aware that the requirement exists.

2. Store in a hidden Index?

Offers some protection against accidental deletion or corruption as the index will not appear in wild card expansion and the user still has direct access to PUT documents in the index. The index and mappings would be managed by ml. The precedent is the .ml-state-* index which has a ILM policy for size based rollover at 50GB.

3. A System Index?

Write access is limited to ml APIs which would preclude uploading the model state directly to index. A new ml REST endpoint would be required for uploading models which is actually a desirable outcome as validation can be performed.

4. .ml-inference-* ?

This system index does not currently have a ILM policy for size based rollover and is mainly used for config. Is mixing config and large binary blobs wise?

dimitris-athanasiou commented 3 years ago

If rollover is used can we ensure that a model is not spread across multiple indices as that would complicate model management?

How does it complicate model management?

I imagine the model index is a single shard index with rollover when it reaches a certain size. So, this already pushes docs of the same model into the same index. It is, of course, possible to have multiple models being stored concurrently, which could result in individual shards hosting docs for multiple models.

The complication seems to be deleting models. We have had trouble doing this in the past as well. The delete-by-query request becomes too expensive. The only way I know to make this fast is to keep a model into its own index (or index pattern).

We could consider a hybrid strategy where we have a shared index pattern for models that are small (we need to define some threshold) and we force bigger models to go into their own index pattern.

davidkyle commented 3 years ago

How does it complicate model management?

In retrospect not significantly. The design relies on model IDs being unique which would normally be enforced by using unique document IDs but that will not work over multiple indices so an up front check is required before PUT. There are various bits of house keeping we would have to implement but that is not a huge amount of code.

The current implementation for boosted tree models does not work with multiple indices, amending that will be the bulk of the work.

We could consider a hybrid strategy where we have a shared index pattern for models that are small (we need to define some threshold) and we force bigger models to go into their own index pattern.

Interesting 🤔

joshdevins commented 3 years ago

We have a strategy that we are using for now. Shall we close this or are there open discussion still?

davidkyle commented 3 years ago

Closed by #71323

elastic / elasticsearch