Open thehabes opened 6 years ago
FYI—after conversation, we'll wait for indexing to be required by poor performance to implement it.
Candidates for indexing...
@type
: especially around Agent, Collection, Manifest, Canvas, Thing, Event, Annotation@context
: IIIF Presentations, Web Annotation, GeoJSON-LD, specific projects. forCollection
: recalling dynamic collection creation fastertarget
/on
: common query simple valuesbody.value
: common query with repeated values and logical operators__rerum.history.*
: maybe useful for leaf filtering if we do not find a better waycreator
: the User or Collaborator in many apps__rerum.generatedBy
: creating App, a popular filter
We are creating scenarios with versioning that have a tendency to require a lot of look ups. This got us thinking about indexing. For the purposes below, assume we have a Collection named 'Player' that has documents inside of it consisting of 'Name' and 'Age' data. The system around this data must consistently look up a 'Player' by 'Name' data.
What is Indexing?
A database index is a data structure (living along side your database, which takes up space) that can be searched very efficiently. The index stores the value of a specific field or set of fields, ordered by the value of the field. Indexing should only be applied to document fields that are consistently searched upon.
What is Indexing in MongoDB?
Without indexes, MongoDB must perform a Collection scan, i.e. scan every document in a Collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect. ex. db.Players.find({"name":"Ramel"}) could search upon our example index without loading all Players document into memory and searching over each name. It would instead load the index and know only to look at documents 4,6,8 and 1000, vastly improving search time and reducing the amount of memory needed to perform the search.
How is indexing implemented in MongoDB as opposed to other DBs?
For our purposes, Mongo indexes can be made on single fields, compounded fields or fields with multi-key properties. This means from our example we could index searches like 'Find players with name Ramel under age 25'. If both Name and Age are indexed, we avoid a Collection scan for both values. This also means we can index fields whose values are arrays (multi-key) and by extension subarrays of arrays.
Why wouldn't Mongo implement some indexing schema by default?
Well, it does on the _id field because of how often people look for things by _id. Other than that, there are things to consider. Like everything else in the NoSQL world, there is no silver bullet to performance issues. Indexes are no exception.
When it comes to indexes, it’s likely you are primarily concerned with query performance. However, remember indexes are a data structure that in particular take up low-level disk space. They also have to be maintained. Each time a document is insert or updated, associated index entries must also be updated. Similarly, index entries have to be found and deleted when documents are removed. This index maintenance can impact write performance. If there are frequent insert, delete and update operations carried out on documents, then the indexes would need to change that often, which would just be an overhead for the Collection. You’re best served creating indexes for your most frequently used queries as they tend to negatively impact performance when overused or used incorrectly.
So then Indexing is actually bad?
Let's give an example to ease your mind. I pull in 50,000+ tweets from twitter and put them into a Collection. Each tweet documents looks like
With all of this data, I now want data only from a specific user. So I run the following
The results of explain() describe the details of how MongoDB executes the query. Some of relevant fields are:
The point here should be clear – a simple query results in the database having to scan every document in the Collection. The query only took 40ms, but we’re also only dealing with about 50,000 documents. This duration will increase as the Collection size increases.
What happens when an index is added to the from_user field?
The query still matches 35 results, but only 35 objects were scanned and the query took 3ms. This is quiet an improvement over 40ms to scan 50,000.
In this case, indexing is great. However, the Collection is full of objects that will not have updates or deletes performed on them in the near future and the Collection itself will only be updated on an infrequent basis so re-indexing the Collection as it takes on the results will be infrequent and the costs negligible.
How do I know if Indexing is right for me?
https://stackoverflow.com/questions/24266837/what-is-the-impact-of-adding-indexes-on-a-mongodb-database-containing-a-large-vo Surely you have a dedicated database architect or back end employee. It will be up to them to create indexes and perform tests against them to see if indexing is right for you. It is a Very Bad Idea to jump into indexing without knowing the effect it will have on your systems and software and without knowing the limits or peculiarities of indexing in mongo (like this one https://stackoverflow.com/questions/27286908/mongodb-indexing-and-projection).