getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
38.69k stars 4.15k forks source link

Cleanup FileBlob #76357

Open Meraj opened 1 month ago

Meraj commented 1 month ago

Environment

Repository: getsentry/sentry Version: 24.7.1

Description

Hi, While reviewing the source code, I noticed the cleanup command for file blobs performs a full iteration over all FileBlob objects to check for related FileBlobIndex or File objects before deciding whether to delete the blob.

The relevant code can be found here: https://github.com/getsentry/sentry/blob/master/src/sentry/runner/commands/cleanup.py#L403C1-L429C22

    for blob in RangeQuerySetWrapper(queryset):
        if FileBlobIndex.objects.filter(blob=blob).exists():
            continue
        if File.objects.filter(blob=blob).exists():
            continue
        blob.delete()

Wouldn't it be more efficient to handle this directly within the ORM? For example:

queryset = FileBlob.objects.filter(
    timestamp__lte=cutoff
).exclude(
    Q(fileblobindex__isnull=False) | Q(file__isnull=False)
)

This approach could potentially improve performance, especially when processing a large number of files.

getsantry[bot] commented 1 month ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 1 month ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 1 month ago

Routing to @getsentry/product-owners-unknown for triage ⏲️