AlexAltea / milli-py

Python bindings for Milli, the embeddable Rust-based search engine powering Meilisearch
MIT License
123 stars 2 forks source link

Upgrade to latest milli #8

Closed write3371 closed 4 months ago

write3371 commented 6 months ago

Milli has been updated since the last commit of milli-py

https://github.com/meilisearch/meilisearch/blob/main/milli/examples/index.rs

afbarbaro commented 4 months ago

Hey @AlexAltea thanks for creating this repo. Curious: how difficult would it be to update these bindings to the latest version of mili? I’ve never done this type of Rust —> Python myself so I’ve no idea if this would take an hour or a day or a month.

AlexAltea commented 4 months ago

I've just bumped to v1.5.1 which is the latest stable release I can build.

Note that I cannot build the actual latest stable release, v1.8.1, or any older one until v1.6.0 since they all reference the dependency heed at 0.20.0-alpha.9: https://github.com/meilisearch/meilisearch/blob/v1.8.1/milli/Cargo.toml#L33.

I'm not a Rust developer, but I suspect Cargo is instead pulling the latest semver-compatible build 0.20.2 which no longer defines the required HeedError::InvalidDatabaseTyping: https://github.com/meilisearch/meilisearch/blob/v1.8.1/milli/src/error.rs#L49

This definition was present in all the 0.20.0-alphaN tags. Compare:

  1. https://github.com/meilisearch/heed/blob/v0.20.0/heed/src/lib.rs#L143
  2. https://github.com/meilisearch/heed/blob/v0.20.0-alpha.8/heed/src/lib.rs#L145
AlexAltea commented 4 months ago

As a temporary workaround, I've pinned 0.20.0-alpha.9 in this project's Cargo.toml.

With that, I was able to bump Meilisearch/Milli to the latest release: v1.8.1.

@write3371 @afbarbaro Note that there's been quite some changes since v1.1.1. Most notably, all dedicated document deletion code is gone, and now both additions and deletions are done from the same IndexDocuments builder.

As a side-effect, deletions now require specifying an external ID (not the internal auto-generated 32-bit integer that Milli assigns by default to every document). This external ID can be a string, so Milli expects now the external IDs to be strings even if the underlying type is an integer.

These are the required changes:

        index = milli.Index(tmp)
        index.add_documents([
            { "id": 0, "title": "Hello world", "content": "This is a sample" },
            { "id": 1, "title": "Hello moon", "content": "This is another sample" },
            { "id": 2, "title": "Hello sun", "content": "This is yet another sample" },
        ])
-       result = index.delete_documents([2, 0])
+       result = index.delete_documents(["2", "0"])
        assert(result == 2) # i.e. 2 documents were removed. Document with external ID ("id") == 1 still exists.
AlexAltea commented 4 months ago

Curious: how difficult would it be to update these bindings to the latest version of mili? I’ve never done this type of Rust —> Python myself so I’ve no idea if this would take an hour or a day or a month.

@afbarbaro Milli is not particularly stable between versions, so it depends: Moving from v1.1.1 to v.1.5.1 took me 5 minutes. But dealing with changes between v1.5.1 and v1.8.1 took me 2 hours. Also note that I'm not a Rust developer. Somebody more experienced probably could do it much faster.