getsentry / sentry-python

The official Python SDK for Sentry.io
https://sentry.io/for/python/
MIT License
1.9k stars 501 forks source link

Transactions not working as expected (metrics are not sent) #3004

Closed raul3z closed 3 days ago

raul3z commented 6 months ago

How do you use Sentry?

Sentry Saas (sentry.io)

Version

1.26.0; 1.44.1

Steps to Reproduce

Run this script

Expected Result

Having seven entries in sentry Discover, from Sherlock.1 to Sherlock.6 and another one for Holmes

Actual Result

Not having all the metrics, sometimes it works and sometimes not, even with traces_sample_rate set to 1.0 and flushing transactions manually.

output log: Hello Sherlock.1! [sentry] DEBUG: Setting up integrations (with default = False) [sentry] DEBUG: Setting up previously not enabled integration atexit [sentry] DEBUG: Enabling integration atexit [sentry] DEBUG: Setting SDK name to 'sentry.python' [sentry] DEBUG: [Tracing] Starting transaction [sentry] DEBUG: [Profiling] Discarding profile because profiler was not started. [sentry] DEBUG: Flushing HTTP transport [sentry] DEBUG: background worker got flush request [sentry] DEBUG: background worker flushed [sentry] DEBUG: [Tracing] Starting transaction [sentry] DEBUG: [Profiling] Discarding profile because profiler was not started. Hello Sherlock.2! [sentry] DEBUG: Flushing HTTP transport [sentry] DEBUG: Sending envelope [envelope with 1 items (transaction)] project:4505... host:o450... [sentry] DEBUG: background worker got flush request [sentry] DEBUG: 2 event(s) pending on flush [sentry] DEBUG: background worker flushed [sentry] DEBUG: [Tracing] Starting transaction [sentry] DEBUG: [Profiling] Discarding profile because profiler was not started. Hello Sherlock.3! [sentry] DEBUG: Flushing HTTP transport [sentry] DEBUG: Sending envelope [envelope with 1 items (transaction)] project:4505... host:o450... [sentry] DEBUG: background worker got flush request [sentry] DEBUG: background worker flushed [sentry] DEBUG: [Tracing] Starting transaction [sentry] DEBUG: [Profiling] Discarding profile because profiler was not started. Hello Sherlock.4! [sentry] DEBUG: Flushing HTTP transport [sentry] DEBUG: Sending envelope [envelope with 1 items (transaction)] project:4505... host:o450... [sentry] DEBUG: background worker got flush request [sentry] DEBUG: background worker flushed [sentry] DEBUG: [Tracing] Starting transaction [sentry] DEBUG: [Profiling] Discarding profile because profiler was not started. Hello Sherlock.5! [sentry] DEBUG: Sending envelope [envelope with 1 items (transaction)] project:4505... host:o450... [sentry] DEBUG: Flushing HTTP transport [sentry] DEBUG: background worker got flush request [sentry] DEBUG: background worker flushed [sentry] DEBUG: [Tracing] Starting transaction [sentry] DEBUG: [Profiling] Discarding profile because profiler was not started. Hello Sherlock.6! [sentry] DEBUG: Sending envelope [envelope with 1 items (transaction)] project:4505... host:o450... [sentry] DEBUG: Flushing HTTP transport [sentry] DEBUG: background worker got flush request [sentry] DEBUG: background worker flushed [sentry] DEBUG: [Tracing] Starting transaction [sentry] DEBUG: [Profiling] Discarding profile because profiler was not started. Hello Holmes! [sentry] DEBUG: Sending envelope [envelope with 1 items (transaction)] project:4505... host:o450... [sentry] DEBUG: Flushing HTTP transport [sentry] DEBUG: background worker got flush request [sentry] DEBUG: background worker flushed [sentry] DEBUG: atexit: got shutdown signal [sentry] DEBUG: atexit: shutting down client [sentry] DEBUG: Flushing HTTP transport [sentry] DEBUG: Sending envelope [envelope with 1 items (transaction)] project:4505... host:o450... [sentry] DEBUG: background worker got flush request [sentry] DEBUG: background worker flushed [sentry] DEBUG: Killing HTTP transport [sentry] DEBUG: background worker got kill request [sentry] DEBUG: Killing HTTP transport

discover_results

sentrivana commented 6 months ago

Hey @raul3z, thanks for reporting and for providing a script to repro!

I can't reproduce this locally, I get Sherlocks 1 through 6 as well as the Holmes transaction, tried this about 5 times.

Stuff that comes to mind:

  1. You write that it sometimes works and sometimes not, what's the rough ratio? Mostly works? Mostly doesn't? 50/50?
  2. Is it always the same transactions missing (the first two)?
  3. I see you're disabling default integrations other than atexit, does this also happen if you remove that?
  4. What Python version are you on?
  5. Are you also observing this on bigger scale?

You selected that you're on SaaS -- if you give me your org name I can take a closer look. (You can also send it to me on ivana.kellyerova @ sentry.io if you prefer that.)

raul3z commented 6 months ago

Hi @sentrivana , thank you for answering me!

  1. I would say that it mostly does not work.
  2. No, it is random
  3. Yes, even worst. If I disable atexit the main process finished before metrics could be sent, I tried this morning disabling it and removing manual flushing and no one reached the server. After that I have set manual flush and tried again, then only 3 out of 7 metrics reached the server.
  4. I am using python 2.7 but I also have tried with python 3.7, same results here.
  5. The problem arose in some parts of our code, this script tries to isolate what is happening. The thing is that we usually use a low sampling rate, so not all metric reach the server (which is normal), in this case I have set the sampling rate to 1.0 to test is properly.

Thanks for your time. Let me check if I can provide you the org name.

sentrivana commented 6 months ago

Thanks for the follow up!

For 3. I could've worded the question better -- I meant if you still observe this even if you remove default_integrations=False, so basically if you enable all default integrations. You need atexit at least for sure so that the SDK intercepts the interpreter shutting down and forces a flush of any pending events. Can you try and see if anything changes if you enable all default integrations?

raul3z commented 6 months ago

Oh sorry, I misunderstood it.

Yes, I just have tried setting default_integrations=True. Same result flushing or nor manually.

For the org name, what do you need exactly? the company site url or the project DSN?

sl0thentr0py commented 6 months ago

@raul3z you're calling a hub.flush within the start_transaction context manager and the transaction finishes and gets queued to the worker once the context manager exits so that flush will have weird behaviour. Please try this instead.

def send_transaction(name):
    """ Send a Sentry transaction """
    with sentry_sdk.start_transaction(name=name, op='test') as transaction:
        hello(name)

def main():
    """ Main function """
    init_sentry()
    for i in range(6):
        send_transaction('Sherlock.{}'.format(i + 1))
    send_transaction('Holmes')
    sentry_sdk.flush()   # not really necessary since atexit will call this anyway
raul3z commented 6 months ago

That seems to work!

I am receiving the seven entries in Sentry, even if I disable the atexit integration, so for any reason that integration is not working well for us.

sl0thentr0py commented 6 months ago

ok to clarify

you can either disable atexit and flush manually:

def init_sentry():
    """ Initializes Sentry """
    sentry_sdk.init(
        debug=True,
        dsn=DSN,
        max_breadcrumbs=100,
        environment=os.environ.get('ENVIRON'),
        traces_sample_rate=1.0,
        default_integrations=False,
        integrations=[],
    )
    sentry_sdk.set_user({'username': getpass.getuser()})

def main():
    """ Main function """
    init_sentry()
    for i in range(6):
        send_transaction('Sherlock.{}'.format(i + 1))
    send_transaction('Holmes')
    sentry_sdk.flush()

or have atexit and not call flush manually:

def init_sentry():
    """ Initializes Sentry """
    sentry_sdk.init(
        debug=True,
        dsn=DSN,
        max_breadcrumbs=100,
        environment=os.environ.get('ENVIRON'),
        traces_sample_rate=1.0,
        default_integrations=False,
        integrations=[AtexitIntegration()],
    )
    sentry_sdk.set_user({'username': getpass.getuser()})

def main():
    """ Main function """
    init_sentry()
    for i in range(6):
        send_transaction('Sherlock.{}'.format(i + 1))
    send_transaction('Holmes')

these should both do the same thing. In general, you don't need to flush manually anywhere if you just use the sdk with default integrations, stuff should just work.

raul3z commented 6 months ago

It works flushing manually otherwise it does not send all the metrics

sl0thentr0py commented 6 months ago

ok well then just add the flush if that works for you. In theory the flush should happen automatically but maybe there's some other problem with your setup.

raul3z commented 6 months ago

Hi @sl0thentr0py,

I have to say that it is working for my isolated script but if I try to do the same in the main code it is not working, I have to check if we have something wrong there with our code but in the meantime, do you have an idea of what could be wrong? what is the recommended setup?

Thanks!

sl0thentr0py commented 6 months ago

maybe you're killing the process with another signal and the process doesn't get enough time to flush? just a hunch

raul3z commented 6 months ago

I have to check the code properly, just to know, do you have an example on how the server should be initialized? not sure if what I'm doing here is good enough.

sl0thentr0py commented 6 months ago

yes that looks ok to me you can even simplify and just use the default default_integrations=True (on by default) unless there's something you want to turn off particularly

raul3z commented 5 months ago

Hi there!

After get deeper on this I was not able to make it works for our main script, in fact I tested again the isolated script with the changes that worked and now it is not working again, we cannot dedicate more effort on this so we will assume that we are going to lost some transactions :(

If you have any idea of what could be happening, it is welcome.

Thanks!

antonpirker commented 5 months ago

Hey @raul3z !

To follow up and summarize:

Some things that I can think of how this could be:

Other than that I have no ideas why those transaction are not showing up in sentry.io...

raul3z commented 4 months ago

Hi @antonpirker !

Thanks for checking it. Yes, we also think that could be a problem with a firewall, proxy or something like that because we refreshed the site a lot of times (even hours later) and the transactions never reached the server.

sentrivana commented 4 months ago

Hey @raul3z, following up from my old comment, the org slug (***.sentry.io) would be enough. I could then check whether I can see any traffic dropped by something in the pipeline.

Another thing that comes to mind is trying out keep_alive=True (see docs) -- if this is a network issue it might make a difference.

getsantry[bot] commented 1 week ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀