J-Rios / TLG_JoinCaptchaBot

Telegram Bot to verify if users joining a group are human. The Bot sends an image captcha for each new user and kicks any of them who can't solve the captcha in a specified time.
GNU General Public License v3.0
532 stars 218 forks source link

Fixed webhook setup for work with revers proxy #197

Closed kvazimoda24 closed 6 months ago

J-Rios commented 6 months ago

Hi, thanks for the contribution :)

I get an idea of why there is an issue for setup the Bot to be used through a reverse proxy, but please is a good practice to give an explanation on the Pull Request decription for current issue and your proposed solution to allow better understand of it.

Also, some change requests:

Something like:

imagen

Regards.

kvazimoda24 commented 6 months ago

According to the documentation of the python-telegram-bot library, there is a parameter webhook_url — this is a string that is completely transmitted to the Telegram servers, and at this address the Telegram server will pull our bot. The remaining parameters concern only our bot and are not transmitted to the Telegram servers (with the exception of secret_token) And at first glance, it seems that these parameters are somehow connected, but when working through a reverse proxy, it turns out that both the port and the path may differ. The hostname is not used anywhere at all. Also, in the code that was before my edit, the webhook_url used the token of the bot itself, which is unsafe, because this token will go to the reverse proxy server log.

Based on what was described above, I decided to simply duplicate the library parameters, because it's concise and logical. We indicate which link to send to the Telegram server, indicate to the bot on which port to wait for the connection and which path to use. If we launch the bot without a reverse proxy, then we simply need to manually enter the port and path in WEBHOOK_URL. It seems to me more logical than creating separate parameters PROXY_URL and PROXY_URL_PATH, which work in a completely unobvious way. Perhaps, to make it more clear, it is worth finalizing the documentation. Your option seems overcomplicated and illogical to me.

J-Rios commented 6 months ago

Well the problem is that your changes modifies and breaks how the user setup the Bot for Webhook when no proxy is used, this is why I prefer to just check with an if statement if the PROXY_URL variable is set, so in that case we apply your suggestion of single setting for the full URL, otherwise (for no proxy usage) we kept the same procedure and settings variables.

Also forgot to question: did you test your solution, and were you able to receive Telegram events using an intermidiate proxy?

J-Rios commented 6 months ago

mmm, I think I get a bit more of your point, taking a look into intenal PTB (python-telegram-bot) library implementation it seems that there is 2 ways for setting up the webhook:

A. Only using "listen", "port" and "url_path" arguments (that will intenally create the "web_url" in the way "https://listen:port/url_path").

B. Using "web_url", "listen", "port" and "url_path" arguments (web_url won't be created by python-telegram-bot; and this allow to setup the proxy address here).

So your proposal is to use option B, so "web_url" will be the Proxy URL were Telegram Server will talk with it, then "listen" and "port" values are from the system that runs the Bot and not the Proxy, is that right?

kvazimoda24 commented 6 months ago

Also forgot to question: did you test your solution, and were you able to receive Telegram events using an intermidiate proxy?

Yes, this code is working for me right now. Public bot @kvazimoda24_capcha_bot

kvazimoda24 commented 6 months ago

I want to point out that it is not safe to use a bot token in an address. We definitely need to get rid of this. And this change will already break compatibility. And since we are breaking compatibility, then it’s normal to do it, and not these half measures. Launching a bot is not something simple, like copying a file from a flash drive to a computer. And this requires a certain competence from a person. We have a saying in Russia: “Make a product that any idiot can use, and only an idiot will use it.” Sometimes I think you think the people who use this bot are fools who can't understand the purpose of a couple of options. :) Actually, I don’t see a problem in making an adequate change, even if it breaks backward compatibility. But the behavior of the code will now be predictable. Before my edits, until I looked into the code myself, I did not know that the bot was shining its token in the webhook_url. I understand that the names of the variables in the config may not be obvious. Therefore, I propose to refine this point. For example, named the variable WEBHOOK_LISTEN_PORT. And, of course, finalize the documentation to describe this in as much detail as possible. We can also describe in Changelog why we had to break backward compatibility. That this is done for the sake of safety and predictable behavior.

kvazimoda24 commented 6 months ago

My logic in this matter is simple: we set a link that we will transfer to the Telegram server (webhook_url), and setting which port to listen to and which path (url_path and port). If we work without a reverse proxy, then we duplicate the port and path in webhook_url. If with a reverse proxy, the values ​​may vary. And further describe in the documentation that using a token as a url_path is an extremely bad idea.

kvazimoda24 commented 6 months ago

And I'll add more. I use a lot of different software. These are JetBrains YouTrack, and NextCloud, and Passwork... And everywhere there is an analogue of the webhook_url setting, which tells the application how it “looks” from the client side when working through a reverse proxy. There are also separate setting which port to listen to. All this means is that I didn’t come up with such a setup scheme out of thin air. This is common practice.

J-Rios commented 6 months ago

So your proposal is to use option B, so "web_url" will be the Proxy URL were Telegram Server will talk with it, then "listen" and "port" values are from the system that runs the Bot and not the Proxy, is that right?

In that case the best solution will be to keep all the next settings and setup the webhook as follow:

app.run_webhook(
    listen=CONST["WEBHOOK_HOST"],
    port=CONST["WEBHOOK_PORT"],
    url_path=CONST["WEBHOOK_PATH"],
    webhook_url=CONST["WEBHOOK_URL"],
    key=CONST["WEBHOOK_CERT_PRIV_KEY"],
    cert=CONST["WEBHOOK_CERT"],
    secret_token=CONST["WEBHOOK_SECRET_TOKEN"],
    drop_pending_updates=True,
    allowed_updates=Update.ALL_TYPES
)
kvazimoda24 commented 6 months ago

Using the WEBHOOK_HOST variable for the listen parameter is a bad idea. listen defines the address that the bot's WEB server will listen to, and the hostname may resolve either to the wrong address or not resolve at all. If you want to set the listen parameter, then it is better to rename the variable to something like WEBHOOK_IP or WEBHOOK_LISTEN_IP.

J-Rios commented 6 months ago

In this context "Host" doesn't mean "Hostname", it is expected for the Host address, but I agree that can be renamed to "WEBHOOK_IP" or "WEBHOOK_ADDRESS" for a better understanding.

It could be set to fixed "0.0.0.0" in the settings.py by default.

kvazimoda24 commented 6 months ago

Anything that can be misunderstood will be misunderstood. Therefore, it is better to actually use either "ADDRESS" or "IP". In the end, it looks like we have reached a consensus.

kvazimoda24 commented 6 months ago

I don't have much experience with Github. Tell me, should I make changes to my pull request?

J-Rios commented 6 months ago

Sure, feel free to add new commits according to this whenever you want ;)

J-Rios commented 6 months ago

I don't have much experience with Github. Tell me, should I make changes to my pull request?

You can just add new changes and commits to your fork branch, then when you push, that changes should automatically update and appear in this Merge Request.

Of course, if you have any issue you can open a new Pull Request, but feel free to use current one as test scenario for you to test and check how Github work in this regards ;)

kvazimoda24 commented 6 months ago

I did as agreed. The descriptions seem to have been corrected everywhere too.