bleroy / lunr-core

A port of LUNR.js to .NET Core
MIT License
565 stars 24 forks source link

[BUG] Loading index from serialized JSON fails if 'position' metadata is enabled and there are multiple positions for single entry #41

Closed Huk256 closed 2 years ago

Huk256 commented 3 years ago

To Reproduce

  1. Create an index with single field named 'body'
  2. Add new document to the index with body that has the same word twice like: { "body", "test test" }
  3. Serialize the index to file or string using ToJson() method
  4. Create new index, and try to load the serialized data with the LoadFromJson() method

Result 'Unexpected token Number.' exception is being thrown.

Expected behavior The index deserializes properly.

Additional data Here is a minimal example:

var index = await Lunr.Index.Build(async builder =>
{
    builder.AllowMetadata("position");
    builder.AddField("body");

    await builder.Add(new Document
    {
        { "body", "test test" },
        { "id", "1" },
    });
});

var json = index.ToJson();
var index2 = Lunr.Index.LoadFromJson(json);

If we change { "body", "test test" } to, for example, { "body", "test test2" } it deserializes fine. From what I can tell, it fails because the metadata deserializer doesn't expect more than one position. So when it encounters something like this:

"invertedIndex": [
    [
        "test",
        {
            "_index": 0,
            "body": {
                "1": {
                    "position": [
                        [
                            0,
                            4
                        ],
                        [
                            5,
                            4
                        ]
                    ]
                }
            }
        }
    ]
],

It loads the first position pair but is unable to deserialize the second one.

bleroy commented 3 years ago

Thanks for the report and the research.

bleroy commented 2 years ago

Fixed in main. Will push new package.