More efficient caching - Githubissues

rsomani95 commented 8 months ago

I have read the docs thoroughly before making this feature request.

[X] Yes

I have read through other open issues, and my issue is not a duplicate.

[X] Yes

Description

The SQL queries generated by the optimizer are super efficient, and per the debug toolbar, they execute really fast. However, the total time to generate the response is quite slow.

In my particular case, it's a query like this:

query AssetsSegmentsTags {
  allVideoAssets {
    edges {
      node {
        id
        segments {
          edges {
            node {
              id
              inTime
            }
          }
        }
      }
    }
  }
}

where the models are setup like so:

class VideoAsset(models.Model):
    ....

class Segment(models.Model):
    video_asset = models.ForeignKey(
        "VideoAsset", on_delete=models.CASCADE, related_name="segments"
    )

The SQL time is ~40ms whereas the time taken to get the response is ~3400ms.

Upon deeper inspection, it appears that this is perhaps largely because of the custom caching mechanism?

Here is the flame graph generated using py-spy. (Opening the file with Google Chrome lets you click on specific items and zoom into them for deeper inspection)

For ease of reference, here's a screenshot of the high level flame graph:

CleanShot 2024-03-04 at 15 30 30@2x

It's unclear to me where exactly the prefetching / DB lookups are happening, but it seems clear from the graph + the debug toolbar info that a lot of the time is spent not doing lookups, but in other parts of cache.py.

Motivation

The SQL generated by the optimizer is highly optimized, but is bottlenecked by (possibly) the caching mechanism. The caching overhead is too high in my particular use case to reap the benefits of the superior SQL generation.

Would you like to solve this issue yourself with a pull request?

Yes, but I'm not sure how to solve this

rsomani95 commented 8 months ago

FYI: I was able to reproduce a similar flame graph to the one above with the test data generated using create_test_data.py. However, the issue is not as prevalent in the test data because the number of connections between objects is a lot lesser in number.

MrThearMan commented 8 months ago

Oh wow, this seems like quite a big regression! The cause seems to be the get_filter_info function I implemented for finding the filtering information for nested query filtering optimization. It will create a new filterset class every time it is called for each object type it finds that has one defined, which seems quite expensive.

Why is this in the caching code? Well, it's included in the optimizer __str__ method, which outputs a representation of the selected fields and joins on the optimized model, which is then used as one of the cache keys. This is used to prevent collisions if multiple objects of the same type need to be stored in the cache, but with different selected fields.

The fortunate thing here is that the problematic code is not even required in __str__ and can just be removed. However, I would like to go over a few more cases using this tool you've provided your examples with (looks great btw!) and make sure that there aren't any other hotspots like this. My worry is that the whole get_filter_info function needs some kind of filterset cache, or be re-thought from scratch. I'll start looking at this ASAP.

rsomani95 commented 8 months ago

@MrThearMan thanks for the prompt response! I noticed the logic inside __str__ but wasn't sure if it could be removed as I noticed the self.compile in there. But I now see that the str(optimizer) calls are only to generate the cache key name. That + your comments are quite reassuring 😁

MrThearMan commented 8 months ago

I might implement a separate function for fetching just the query cache key, since the current implementation is a bit confusing, as it basically runs the optimizer again just to generate the key. Maybe I can just generate the key when the optimization is run and then use it during the caching, we'll see.

MrThearMan commented 8 months ago

Hmm, seems like there might be performance issues with the filterset class you are using. From looking at the flame graph closer, it seems like there are some database queries being executed Specifically some of the filter fields seem to be doing .all() for related managers during __new__ for some reason. I cannot say more with the information I've got, but it might be worth your while to look into this. I'll still try to optimize the issue raised here, don't worry.

rsomani95 commented 8 months ago

Ah, interesting. Totally possible; I'll look into it. Thanks for pointing that out.

MrThearMan commented 8 months ago

I wasn't able to replicate the issue you had with the filterset creation, so I suspect it has something to do with the specific filterset class and/or fields you are using.

However, I removed the get_filter_info call from the cache key creation, made it more explicit by using a property, and made it compile itself during the optimization. I also made filterset class creation happen at object type creation time, so it should only happend during project startup. This should make get_filter_info perform better in general.

One "breaking" change I made to this is that I now bypass graphene_djangos get_filterset_class, as I didn't see the issues it was trying to solve being part of graphene_djangos, or this library's concerns, but django_filters.

Released in 0.2.3

rsomani95 commented 8 months ago

@MrThearMan I just tested this fix on my project successfully. Earlier, the total response time was 3400ms, and now it's down to ~365ms. About 9.3x faster. Great stuff!!

I'll investigate my filterset / fields for further improvements, but given that we're not re-initialising the filterset constantly, this is in a really good spot for our team.

didn't see the issues it was trying to solve being part of graphene_djangos, or this library's concerns, but django_filters

FWIW, I'm in agreement.

MrThearMan / graphene-django-query-optimizer

More efficient caching #70

I have read the docs thoroughly before making this feature request.

I have read through other open issues, and my issue is not a duplicate.

Description

Motivation

Would you like to solve this issue yourself with a pull request?