Reduce model chunk size to 1 MB

This PR reduces the chunk size of the model stored in Elasticsearch from 4 MB to 1 MB. We've seen less memory pressure by using 1 MB chunks.

Part of issue: https://github.com/elastic/elasticsearch/issues/99409

Testing

I tested this by importing a model around ~300 MB and extracted the binary_definition field from the document which results in a file containing the base64 contents. The file is around 1 MB:

docker run -it --rm --network host elastic/eland \
    eland_import_hub_model \
      --url http://elastic:password@host.docker.internal:9200/ \
      --hub-model-id sentence-transformers/all-distilroberta-v1 \
--clear-previous

2023-09-20 14:56:34,232 INFO : Creating model with id 'sentence-transformers__all-distilroberta-v1'
2023-09-20 14:56:34,854 INFO : Uploading model definition
100%|███████████████████████████████████████████████████████████████████████████| 312/312 [00:12<00:00, 24.15 parts/s]
2023-09-20 14:56:47,776 INFO : Uploading model vocabulary
2023-09-20 14:56:47,957 INFO : Model successfully imported with id 'sentence-transformers__all-distilroberta-v1'

When searching for the chunks there were ~300 documents which means were are correctly storing the model in 1 MB chunks.

GET http://localhost:9200/.ml-inference-*/_search
{
    "size": 1,
    "sort": [
        {
            "_index": "desc"
        },
        {
            "doc_num": "asc"
        }
    ],
    "_source": false,
    "query": {
        "bool": {
            "filter": [
                {
                    "term": {
                        "model_id": {
                            "value": "sentence-transformers__all-distilroberta-v1"
                        }
                    }
                },
                {
                    "term": {
                        "doc_type": {
                            "value": "trained_model_definition_doc"
                        }
                    }
                }
            ]
        }
    }
}

Result:

{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 312, <--------------
            "relation": "eq"
        },
        "max_score": null,
        "hits": [
            {
                "_index": ".ml-inference-native-000001",
                "_id": "trained_model_definition_doc-sentence-transformers__all-distilroberta-v1-0",
                "_score": null,
                "sort": [
                    ".ml-inference-native-000001",
                    0
                ]
            }
        ]
    }
}

elastic / eland

Reduce model chunk size to 1 MB #605

Testing