doableware / djongo

Django and MongoDB database connector
https://www.djongomapper.com
GNU Affero General Public License v3.0
1.86k stars 351 forks source link

Proposal to add estimated_document_count implementation #678

Open iYasha opened 10 months ago

iYasha commented 10 months ago

I work with a lot of data in MongoDB. And I got a problem with performance while I worked with count. I even wrote some tests to demonstrate a difference between count that used aggregation and estimated_document_count that used metadata of collection to return the estimated count. My proposal is to add estimated_document_count implementation to the library. If you think this is a good idea, let me know and I send you PR as soon as possible.

My tests

def djongo_count_test():
    start_time = time.time()
    res = Model.objects.count()
    print("--- djongo_count_test %s seconds | count %s ---" % (time.time() - start_time, res))

def estimated_count_test():
    start_time = time.time()
    if connections["mongo_db"].connection is None:
        connections["mongo_db"].connect()
    res = connections["mongo_db"].connection.collection_name.estimated_document_count()
    print("--- estimated_count_test %s seconds | count %s ---" % (time.time() - start_time, res))

Results of testing

--- estimated_count_test 0.01596400260925293 seconds | count 1074517 ---
--- djongo_count_test 0.32147812843322754 seconds | count 1074517 ---

P.S. I ran these tests only on my working machine, and I know that depending on hardware these tests result will be different, but the general point of this issue will be the same