doctrine / mongodb-odm

The Official PHP MongoDB ORM/ODM
https://www.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/
MIT License
1.09k stars 504 forks source link

SchemaManager.updateIndexes() rebuilds text indexes unnecessarily #1821

Closed kperry42 closed 6 years ago

kperry42 commented 6 years ago

Bug Report

Q A
BC Break no
Version 1.2.4

Summary

SchemaManager.updateIndexes() rebuilds all text indexes every time it is called, because it is not recognizing them as being equivalent to their definitions.

Current behavior

SchemaManager.isMongoIndexEquivalentToDocumentIndex() does not recognize text indexes as being equivalent to their definition. So, all text indexes are rebuilt every time updateIndexes() (or updateDocumentIndexes()) is called.

It appears that text indexes are failing the equivalence test because their "key" property looks like { "_fts": "text", "_ftsx": 1 }, rather than the actual list of keys.

How to reproduce

Use MongoDB\ODM to define a text index on a collection. Call SchemaManager->updateIndexes() to create the index. For instance, I have a YAML collection def'n that looks like this:

My\Documents\Group:
  type: document
  collection: groups
  indexes:
    group_text:
      language: en
      keys:
        name: text
        description: text
      options:
        name: group_text
        weights:
          name: 3
          description: 1
  fields:
    id:            { type: string, id: true }
    name:          { type: string }
    description:   { type: string }

Then, given a DocumentManager $dm, and the name of the document class for the collection, $documentName, the following code will print out a line saying that the text index is mis-matched, showing that it has failed the isMongoIndexEquivalentToDocumentIndex() test. (the code here is mostly borrowed from SchemaManager.updateDocumentIndexes(), with an echo statement added.)

    $sm = $dm->getSchemaManager();
    $class = $dm->getClassMetadata($documentName);
    $documentIndexes = $sm->getDocumentIndexes($documentName);
    $collection = $dm->getDocumentCollection($documentName);
    $mongoIndexes = $collection->getIndexInfo();
    $mongoIndexes = array_filter($mongoIndexes, function ($mongoIndex) use ($documentIndexes, $sm) {
        if ('_id_' === $mongoIndex['name']) {
            return false;
        }
        foreach ($documentIndexes as $documentIndex) {
            if ($sm->isMongoIndexEquivalentToDocumentIndex($mongoIndex, $documentIndex)) {
                return false;
            }
        }
        return true;
    });

    foreach ($mongoIndexes as $mongoIndex) {
        if (isset($mongoIndex['name'])) {
            echo "mismatched index: $documentName $mongoIndex[name]\n";
        }
    }

Expected behavior

The index should be recognized as equivalent, and not get deleted and recreated, unless something about its definition has actually changed.

alcaeus commented 6 years ago

Thanks for the report - I'll take a look and see how we can fix this 👍

alcaeus commented 6 years ago

Looking at the issue I realized that the listIndexes command in MongoDB doesn't expose the keys for a text index at all:

    {
        "v" : 2,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "group_text",
        "ns" : "test.textIndex",
        "weights" : {
            "description" : 1,
            "name" : 3
        },
        "default_language" : "en",
        "language_override" : "language",
        "textIndexVersion" : 3
    }

Without those, we can't accurately compare whether the index specification changed. While the weights option gives a hint to the fields in question, it is optional (with a weight of 1 being assumed if none given), so that's very unreliable at best. Apart from that, I'm not sure how we could accurately detect changes to the keys that would force a recreation of the text index.

We could of course ignore keys for text indexes and either assume that an option changed (thus forcing recreation) or tell people to drop/recreate those indexes manually, but I'm not sure about the best course of action here.

@jmikola are you aware of any other methods to retrieve the keys within a text index? Does the server even know them? I'm not very hopeful on this since they're not exposed in listIndexes, but that might have been an oversight.

alcaeus commented 6 years ago

I did some more digging and it looks like we can use the weights option for text indexes:

> db.test.createIndex({foo: 'text', bar: 'text'})
{
    "createdCollectionAutomatically" : true,
    "numIndexesBefore" : 1,
    "numIndexesAfter" : 2,
    "ok" : 1
}
> db.test.getIndexes()
[
    {
        "v" : 2,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "test.test"
    },
    {
        "v" : 2,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "foo_text_bar_text",
        "ns" : "test.test",
        "weights" : {
            "bar" : 1,
            "foo" : 1
        },
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 3
    }
]

With that, I'd say we recreate a text index if any of the following is true:

For text indexes, we no longer compare keys. I'll prepare a bugfix for 1.2 in the coming days. Thanks @kperry42 for your initial analysis and good write-up of the issue 👍

kperry42 commented 6 years ago

Sounds good! thanks.

jmikola commented 6 years ago

@kperry42: Please try out the patch in https://github.com/doctrine/mongodb-odm/pull/1828 if you get a chance. I added a fair number of text cases but it'd be good to confirm that it addresses your use case.

jmikola commented 6 years ago

For text indexes, we no longer compare keys

Not entirely correct, as a compound index may include some number of text-indexed fields in combination with standard ascending/descending index keys. I made sure to handle this in the patch above :)

kperry42 commented 6 years ago

Yes, it looks like this works perfectly for me. Thanks again.