alexdlaird / pyngrok

A Python wrapper for ngrok
https://pyngrok.readthedocs.io
MIT License
421 stars 59 forks source link

Stackoverflow [68766528] - ngrok reconnect issues #88

Closed vladivanovic closed 3 years ago

vladivanovic commented 3 years ago

Hi Alex,

Thank you for responding to me here: https://stackoverflow.com/questions/68766528/pyngrok-retrying-failed-connections?noredirect=1#comment121546605_68766528

As for my code, I have a Flask server which calls this function here to automatically start an ngrok tunnel:

# Function to start ngrok instance e.g. when restart button on Admin page is hit
def startngroktunnel():
    ngrokFile = os.path.abspath("ngrok.yml")
    ngrokConfig = conf.PyngrokConfig(config_path=ngrokFile)
    http_tunnel = ngrok.connect(name='merakihud', pyngrok_config=ngrokConfig)

my YAML file:

authtoken: <hidden>
tunnels:
  merakihud:
    addr: 5001
    proto: http
    root_cas: trusted
    bind_tls: true

Stack trace / log:

vlad@ubuntu:~/fujiwara-api/app$ python3 webhook.py 
t=2021-08-16T11:00:39+0900 lvl=eror msg="failed to reconnect session" obj=csess id=eeb66655d762 err="x509: certificate signed by unknown authority"
Traceback (most recent call last):
  File "webhook.py", line 11, in <module>
    appsc.startngroktunnel()
  File "/home/vlad/fujiwara-api/app/app_startchecks.py", line 112, in startngroktunnel
    http_tunnel = ngrok.connect(name='merakihud', pyngrok_config=ngrokConfig)
  File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/ngrok.py", line 251, in connect
    api_url = get_ngrok_process(pyngrok_config).api_url
  File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/ngrok.py", line 162, in get_ngrok_process
    return process.get_process(pyngrok_config)
  File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/process.py", line 295, in get_process
    return _start_process(pyngrok_config)
  File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/process.py", line 472, in _start_process
    raise PyngrokNgrokError("The ngrok process errored on start: {}.".format(ngrok_process.startup_error),
pyngrok.exception.PyngrokNgrokError: The ngrok process errored on start: x509: certificate signed by unknown authority.

Basically this script works on other systems, its just this one and I'm quite sure its my company trying to prevent reverse proxy tunnels through the use of a cloud DNS system to block requests (I'm going to test this also by manually changing my DNS Settings in my Ubuntu VM) and this only started being blocked recently from what I can tell.

ngrok application stdout log

vlad@ubuntu:~/fujiwara-api/app$ ngrok start --log=stdout -config ngrok.yml merakihud
INFO[08-16|11:10:08] open config file                         path=/home/vlad/fujiwara-api/app/ngrok.yml err=nil
t=2021-08-16T11:10:08+0900 lvl=info msg="starting web service" obj=web addr=127.0.0.1:4040
t=2021-08-16T11:10:09+0900 lvl=eror msg="failed to reconnect session" obj=csess id=c11a09a2aae0 err="x509: certificate signed by unknown authority"
t=2021-08-16T11:10:09+0900 lvl=eror msg="failed to reconnect session" obj=csess id=c11a09a2aae0 err="x509: certificate signed by unknown authority"
t=2021-08-16T11:10:10+0900 lvl=eror msg="failed to reconnect session" obj=csess id=c11a09a2aae0 err="Get \"https://dns.google.com/resolve?cd=true&name=tunnel.us.ngrok.com&type=A\": x509: certificate signed by unknown authority"
t=2021-08-16T11:10:13+0900 lvl=info msg="tunnel session started" obj=tunnels.session
t=2021-08-16T11:10:13+0900 lvl=info msg="client session established" obj=csess id=c11a09a2aae0
t=2021-08-16T11:10:13+0900 lvl=info msg="started tunnel" obj=tunnels name=merakihud addr=http://localhost:5001 url=https://444b23cd36ff.ngrok.io

As you can see, it swaps to dns.google.com to resolve the tunnel then magically it comes up, I find pyngrok stops before it hits this fallback and so I can't get the tunnel up otherwise. Any help would be much appreciated!

alexdlaird commented 3 years ago

I will look in to this more later, but off the top of my head, this catches my eye: "failed to reconnect session".

Can you try this?

    ngrokConfig = conf.PyngrokConfig(config_path=ngrokFile, reconnect_session_retries=2)

reconnect_session_retries defaults to 0, and if it's increased it retries when it sees "failed to reconnect session" (this is done in testing to fix flaky tests when the pipe breaks)—I wonder if it'll also resolve this issue.

vladivanovic commented 3 years ago

@alexdlaird I've tried the following and even 3 and 4 retries but unfortunately it never hits the attempt to utilise Google DNS via JSON to resolve the ngrok service IP Address. I wonder what the difference is? does pyngrok retry the entire ngrok process or just allow the natural ngrok process retries to occur?

Setting my resolv.conf to Google works around this issue, naturally "fix your DNS service / unblock DNS resolution of the ngrok service" is likely the best workaround in this case but it would be interesting if there was a way to fix pyngrok to utilise the ngrok workaround as well. I'll leave it with you to decide how to proceed but happy to test for you if you wanted to go ahead.

vlad@ubuntu:~/fujiwara-api/app$ python3 webhook.py 
t=2021-08-16T12:23:56+0900 lvl=eror msg="failed to reconnect session" obj=csess id=06e7330364c9 err="x509: certificate signed by unknown authority"
ngrok reset our connection, retrying in 0.5 seconds ...
t=2021-08-16T12:23:59+0900 lvl=eror msg="failed to reconnect session" obj=csess id=06141fc7b580 err="x509: certificate signed by unknown authority"
ngrok reset our connection, retrying in 0.5 seconds ...
t=2021-08-16T12:24:01+0900 lvl=eror msg="failed to reconnect session" obj=csess id=7df9a3dc2ca6 err="x509: certificate signed by unknown authority"
ngrok reset our connection, retrying in 0.5 seconds ...
t=2021-08-16T12:24:01+0900 lvl=eror msg="failed to reconnect session" obj=csess id=fb2251fa1eff err="x509: certificate signed by unknown authority"
ngrok reset our connection, retrying in 0.5 seconds ...
t=2021-08-16T12:24:02+0900 lvl=eror msg="failed to reconnect session" obj=csess id=860a4980bde7 err="x509: certificate signed by unknown authority"
Traceback (most recent call last):
  File "webhook.py", line 11, in <module>
    appsc.startngroktunnel()
  File "/home/vlad/fujiwara-api/app/app_startchecks.py", line 112, in startngroktunnel
    http_tunnel = ngrok.connect(name='merakihud', pyngrok_config=ngrokConfig)
  File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/ngrok.py", line 251, in connect
    api_url = get_ngrok_process(pyngrok_config).api_url
  File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/ngrok.py", line 162, in get_ngrok_process
    return process.get_process(pyngrok_config)
  File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/process.py", line 295, in get_process
    return _start_process(pyngrok_config)
  File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/process.py", line 470, in _start_process
    return _start_process(pyngrok_config, retries + 1)
  File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/process.py", line 470, in _start_process
    return _start_process(pyngrok_config, retries + 1)
  File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/process.py", line 470, in _start_process
    return _start_process(pyngrok_config, retries + 1)
  [Previous line repeated 1 more time]
  File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/process.py", line 472, in _start_process
    raise PyngrokNgrokError("The ngrok process errored on start: {}.".format(ngrok_process.startup_error),
pyngrok.exception.PyngrokNgrokError: The ngrok process errored on start: x509: certificate signed by unknown authority.
alexdlaird commented 3 years ago

Could you also try "root_cas": "host" or "trust_host_root_certs": false in your config.yml instead? https://github.com/inconshreveable/ngrok/issues/418

You are correct, when pyngrok sees a eror log, even if it sees "failed to reconnect session", it kills the process and restarts it, so it's retrying the entire process and not utilizing ngrok's built-in retry mechanism, which is why it's not working for you.

vladivanovic commented 3 years ago

I just tried but unfortunately neither of those worked for me.

That's good to know my understanding of how pyngrok and ngrok are interfacing with each other, its likely a lot harder to parse an interactive session with ngrok and use it's built-in retry mechanism as you have no control over it?

It more or less seems to be the DNS Provider blocking or attempting to redirect and intercept the ngrok traffic post DNS-resolution via their service (in this case Cisco Umbrella, but it could be zScaler or whatever cloud security service like these set up to do this, it tells my VM the IP address of the cloud firewall which attempts to man in the middle traffic).

So the resolution is either use a different DNS Service or to try to utilise ngrok's native built-in retry mechanism but that doesn't sound like it would be easy or ideal.

alexdlaird commented 3 years ago

Could you try one more debug thing for me? The PyngrokNgrokError you're showing me the stacktrace for also has a ngrok_logs field in it, which will show all startup logs. Could you print that list of logs out and share it here? My suspicion is that, under the hood, pyngrok is actually doing the right thing (I think we'll see three "failed to reset session" logs with retries followed by success), but because a startup_error has also been registered, the process is torn down even after successfully coming up due to this:

https://github.com/alexdlaird/pyngrok/blob/40bc1aab5b2dce084d3d18370b49eea048b2ae88/pyngrok/process.py#L462

If you share the logs and they show that, I think I know how we can make a workaround for this edge case that doesn't effect other valid error paths or nominal paths.

vladivanovic commented 3 years ago

Thank you @alexdlaird, I hate to ask as I'm not the most experienced coder especially when it comes to error handling... I've created the following but I'm unsure how I print the ngrok_logs specifically, everything I've tried fails or comes up short or re-prints the same thing from the looks of things, any tips or examples sorry? Googling around I couldn't quite find an example.

def startngroktunnel():
    try:
        ngrokFile = os.path.abspath("ngrok.yml")
        ngrokConfig = conf.PyngrokConfig(config_path=ngrokFile)
        http_tunnel = ngrok.connect(name='merakihud', pyngrok_config=ngrokConfig)
    except exception.PyngrokError:
        print(exception.PyngrokNgrokError)
alexdlaird commented 3 years ago

This should do it:

def startngroktunnel():
    try:
        ngrokFile = os.path.abspath("ngrok.yml")
        ngrokConfig = conf.PyngrokConfig(config_path=ngrokFile)
        http_tunnel = ngrok.connect(name='merakihud', pyngrok_config=ngrokConfig)
    except PyngrokNgrokError as e:
        print(e.ngrok_logs)
vladivanovic commented 3 years ago

Thank you! Got it working now, here is the output of 4 reconnect_attempts:

vlad@ubuntu:~/fujiwara-api/app$ python3 webhook.py t=2021-08-17T13:41:45+0900 lvl=eror msg="failed to reconnect session" obj=csess id=a50d0f467b6f err="x509: certificate signed by unknown authority" ngrok reset our connection, retrying in 0.5 seconds ... t=2021-08-17T13:41:46+0900 lvl=eror msg="failed to reconnect session" obj=csess id=9fa817d53f0a err="x509: certificate signed by unknown authority" ngrok reset our connection, retrying in 0.5 seconds ... t=2021-08-17T13:41:47+0900 lvl=eror msg="failed to reconnect session" obj=csess id=0c93fe63ef00 err="x509: certificate signed by unknown authority" ngrok reset our connection, retrying in 0.5 seconds ... t=2021-08-17T13:41:47+0900 lvl=eror msg="failed to reconnect session" obj=csess id=0c93fe63ef00 err="x509: certificate signed by unknown authority" ngrok reset our connection, retrying in 0.5 seconds ... t=2021-08-17T13:41:48+0900 lvl=eror msg="failed to reconnect session" obj=csess id=27ef7f091f67 err="x509: certificate signed by unknown authority" [, , ] t=2021-08-17T13:41:48+0900 lvl=eror msg="failed to reconnect session" obj=csess id=27ef7f091f67 err="x509: certificate signed by unknown authority" Traceback (most recent call last): File "webhook.py", line 16, in ngrok_tunnels = ngrok.get_tunnels() File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/ngrok.py", line 324, in get_tunnels api_url = get_ngrok_process(pyngrok_config).api_url File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/ngrok.py", line 162, in get_ngrok_process return process.get_process(pyngrok_config) File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/process.py", line 295, in get_process return _start_process(pyngrok_config) File "/home/vlad/.local/lib/python3.8/site-packages/pyngrok/process.py", line 472, in _start_process raise PyngrokNgrokError("The ngrok process errored on start: {}.".format(ngrok_process.startup_error), pyngrok.exception.PyngrokNgrokError: The ngrok process errored on start: x509: certificate signed by unknown authority.

vladivanovic commented 3 years ago

Just to confirm, this is the final code I went with (from pyngrok import ngrok, conf, exception is at the top):

def startngroktunnel():
    try:
        ngrokFile = os.path.abspath("ngrok.yml")
        ngrokConfig = conf.PyngrokConfig(config_path=ngrokFile, reconnect_session_retries=4)
        http_tunnel = ngrok.connect(name='merakihud', pyngrok_config=ngrokConfig)
    except exception.PyngrokNgrokError as e:
        print(e.ngrok_logs)
alexdlaird commented 3 years ago

I've created a branch that should rely on ngrok's built-in retry mechanism rather than forcing the entire process to restart. Would you be able to test it behind your firewall and put the logs here again?

To test the branch, do the following:

git clone git@github.com:alexdlaird/pyngrok.git
cd pyngrok
git checkout 88
make local

This will install the dev build of pyngrok in to your Python installation. When you re-run your script now, it should use the latest build of the package instead. (If you're using a venv, you may need to activate that first so make local installs it on the venvs path instead.)

alexdlaird commented 3 years ago

PR #89 has been merged and version 5.1.0 should now use ngrok's native retry mechanism instead of killing the binary each time.

vladivanovic commented 3 years ago

Hi Alex, my apologies we had to prepare for a presentation with a workaround in place for now so I didn't get an opportunity to test. Will do it with the version you've released now and get back to you if I have any issues (but no doubt you've probably been able to test it well enough yourself if you've committed it :))

Thanks again for the help and for looking in to this!