Closed HirbodBehnam closed 2 months ago
Thank you very much for the kind words! Also that's a good idea. Let me try this with my RedditDownloaderBot to see how it goes...
Edit: Now the bot is live with this pull request! I just added this line to go.mod
and compiled it again:
replace github.com/PaulSonOfLars/gotgbot/v2 => github.com/HirbodBehnam/gotgbot/v2 v2.0.0-20240906045508-c060fa7cd0e3
Thank you very much for the kind words! Also that's a good idea. Let me try this with my RedditDownloaderBot to see how it goes...
Edit: Now the bot is live with this pull request! I just added this line to
go.mod
and compiled it again:replace github.com/PaulSonOfLars/gotgbot/v2 => github.com/HirbodBehnam/gotgbot/v2 v2.0.0-20240906045508-c060fa7cd0e3
Exciting! Would it be possible for you to check your logs tomorrow and report back if there are any issues?
This may be me being overly cautious, so I apologise for that - I usually run most PRs on Rose for a few days before merging, but she doesn't do much file handling, so you seem to be better placed here!
So I've got this running for about a day and half and I think it's good. So first things first, I used journalctl --since "2 days ago" -u redditdownloaderbot.service | grep "failed to execute POST request to"
to check if there is any errors regarding file upload. Apparently, there are errors regarding file upload but the reason for that is the I've set the timeout of file uploads too low and thus, the logs are filled with context deadline exceeded
. But no errors regarding the pipe stuff.
Also, I had a look at Grafana charts to see the memory usage and there are clearly less spikes in memory usage compared to before this patch. (Also bear in mind that there are other services running on this server and time zone is in Asia/Tehran)
As you can see, after I deployed this version, the number of spikes has reduced.
So what do you think? Shall I do other experiments as well?
No, I'm all convinced - looks like a merge to me!
Really appreciate in-depth debugging detail, and the grafana dashboard you've provided - definitely shows an impressive improvement.
Thanks again for the excellent PR, and don't hesitate to open more PRs/issues if you've got any other suggestions in the future!
What
TL;DR: Use
io.Pipe
instead of buffering files inbytes.Buffer
to reduce the memory usage of file uploads.Before this commit, if there was a file(s) to be uploaded to Telegram, it would have been buffered in memory in a
bytes.Buffer
and then send to Telegram servers. This would have posed problems if you were uploading media groups because those can have up to ten files and thus, increase the RAM usage of this library.The idea here is to asynchronously read from files and write them to the HTTP request. This can be achieved by using a goroutine and
io.Pipe
. As you can probably tell from the name,io.Pipe
is basically a pipe which whatever is written to the writer part can be read from the reader part. However, if a data is written to pipe, it must be read before the next data (or data chunk) can be written again. Becauseio.Copy
is used to copy the content of file to themultipart.Writer
, the buffer of file content is 32KB and thus the memory usage of file buffering is around 32KB.To execute this idea, we should create a
io.Pipe
, give the reader part to thehttp.NewRequestWithContext
and the writer part tomultipart.NewWriter
. This means that another goroutine can write to the pipe (which is tied tomultipart.Writer
) and the request can read the content. This is exactly what I've done. As you can see, I've just ranfillBuffer
in another goroutine. Also,FormDataContentType
can be ran as soon as themultipart.Writer
is created (unless the boundary is manually changed which is not the case here) which I've also done. The last thing which is left is error handling of themultipart.Writer
. This can be done withPipeWriter.CloseWithError
. If the writer part is closed with an error, the error will be sent to the reader part and thebot.Client.Do(req)
will catch the error and print it withfailed to execute POST request to ...
error message. You also need to also defer the reader side closure to avoid the writer goroutine leak.One other very small change I did was to swap the JSON marshaling in
bytes.Buffer
with fusingjson.Marshal
andbytes.Reader
. Internally,json.Marshal
uses a pool ofbytes.Buffer
s and tries to reuse them as much as it can. Creating a newbytes.Buffer
every time instead of using the buffer pool I think is a little bit slower. But if you want, I can simply use the oldbytes.Buffer
andjson.NewEncoder
.Let me know what you think and I would like to thank you for the very niche idea you had to automate the code generation based on Telegram API docs!
Impact