AJAYK-01 / KTU-Notifier

An NLP based Telegram Bot that pushes KTU Announcements Notifications
http://t.me/ktunotifbot
GNU General Public License v3.0
18 stars 7 forks source link

High memory useage while running scheduled check for new notifcations #7

Open AJAYK-01 opened 3 years ago

AJAYK-01 commented 3 years ago

Describe the bug:

When the APScheduler runs the function scheduledjob (line 70, bot.py), the RAM useage spikes up over time, reaching an unreasonable high value (upto 1GB useage in less than a day's use! ), thereby making the bot unhosteable on any of the free plans of common hosting providers.

To Reproduce:

Steps to reproduce the behavior:

  1. Create your own bot token and set up the bot as mentioned in the README instructions.
  2. Run python bot.py
  3. Open up your local memory useage monitor application (task manager) and monitor the memory useage of the bot.
  4. Wait for a few hours or just increase the frequency of the scheduler from 10 mins in line 181, bot.py to see the memory spike sooner.
  5. Try removing a latest notification from your notifications cache (database), to trigger the bot to send a notification to the users, the memory spike is more profound in the case when new notification triggers.

Expected behavior:

The memory useage spikes up with each run of schedule job slowly, and the spike is much more profound when a new notification comes in the KTU website and the bot starts to send users the notification.

Screenshots:

Gradual increase of memory useage:
image


Eventually crossing free tier quota on heroku:

image

Additional context:

Currently the bot is being restarted by a systemd timer every hour in a custom VPS, but this hacky solution can't be used when hosting on hosting providers like heroku.

AJAYK-01 commented 3 years ago

Upon further inspection, the memory spike seems to only occur when the scrape() is called in line 22, bot.py

There is no spike when that line is commented out 🤔

AJAYK-01 commented 3 years ago

Then memory spike is less when using html.parser instead of html5lib, but ktu site tends to have broken html tags at times, so it is risky to use html.parser

The issue is caused by the soup object which is not getting destroyed after each scrape.

Tried making soup = None followed by gc.collect() but doesn't seem to have any effect.