Luxembourg-Open-Source-Club / BICS-BOT

Discord bot that allows university students to organize and manage their studies in their Bachelors/Masters with various utilities.
GNU General Public License v3.0
6 stars 14 forks source link

Guichet Etudiant Calendar Integration #33

Open umutucak opened 1 year ago

umutucak commented 1 year ago

Is your feature request related to a problem? Please describe.

We have the calendar available to all students on Guichet Etudiant. However, accessing it on-the-go is not the most convenient thing (from personal experience). It would be quicker to have it at hand on the Discord server.

Describe the solution you'd like

We can scrape the calendar page on Guichet Etudiant and parse the HTML to get the updated calendar. This can be automated with a script scheduled to run weekly.

Describe alternatives you've considered

We can download the .ical file available on Guichet Etudiant. I am not sure if this can be done via scraping or another approach, or whether it is possible to automate at all.

Additional context Below are some example snippets of the HTML that we can scrape.

Snippet 1: Course Information

image

The red box highlights the <td> bodies. These represent the week days from Sunday to Monday. So in this example, the third <td> having content indicates that on Tuesday there is a class.

The green box highlights what that class is. We can see information such as the time and name of this class.

This consistent formatting can be used to automate this task using scripts.

DerAndereJohannes commented 10 months ago

Hi. Over the past hour I have been playing around with the authentication process for the guichet etudiant website and have figured out how we could do this without any additional heavy dependencies. While selenium headless browsers work, it seemed too unfortunate to use this as it would add much bulk. I have managed to achieve it using curl requests (it was actually easier than I thought). I am sure that these requests can easily be translated to python requests.

Using these basic requests, you are able to directly curl the json files that the website itself uses for displaying the information rather than scraping the website to retrieve the information. Can you add additional information as to how this project usually stores useful data and any commands that should be created to access the calendar information? It is not quite clear to me.

PedroS235 commented 10 months ago

@DerAndereJohannes oh that's good news. I guess then indeed it will make our life easier. But the main idea is to have a chat, which I believe it already exists and send a message weekly or maybe daily of the courses of each semester, by year. So in the winter, semester 1,3,5 would be shown in their respective channels. and then for the summer semesters the same.

PedroS235 commented 10 months ago

so I guess there is no need for commands, but it would be more of a task/loop that every 24h or weekly it would fetch new updates if any to the schedule. I guess weakly makes more sense? @umutucak what do you think?

umutucak commented 10 months ago

@DerAndereJohannes that is really cool!

Data Storage

Info could be stored in a new directory db that I've been thinking if introducing for a while in the root of the project. We can either (a) store the values in JSON (or whatever) format in this directory, or (b) make requests around 7h, process the data, check for diff, and update the message accordingly without any local data storage. @PedroS235 I'm a fan of doing daily updates especially now that we do not have the heavy headless browser overhead. Calendar updates on the side of the uni are done throughout the week, thus it is only natural for us to do daily updates too I think.

Commands/Calendar Access

This feature is a new and fully independent feature, not related to the already present /calendar command. The current implementation of the /calendar command works more like a TODO list of exams/quizes/homeworks.

The current /calendar command is not used, it could be reworked into a new name, and this new calendar integration could be the new titular Calendar feature of the bot.

On the server we already have 3 calendar channels; they are in the year1/2/3 categories. They can be made read-only and be restructured into this bot-managed system. The bot can edit it's own messages in these channels.

Question: Through these curl requests, were you able to bypass authentication?

DerAndereJohannes commented 10 months ago

The requests are immediate and can of course be done in async / on a separate thread. There is no need to be overly conservative on sending requests ( of course do not ddos ;-) ). Here are some thoughts on other aspects:

On the server we already have 3 calendar channels; they are in the year1/2/3 categories. They can be made read-only and be restructured into this bot-managed system. The bot can edit it's own messages in these channels.

This sounds like the most logical use for this issue.

... check for diff, ...

Diff checking might just be superfluous if you just generate the whole message again from the new data and update the message accordingly (unless you want to also create a new message pinging the cohort about the change which might be nice)

This feature is a new and fully independent feature, not related to the already present /calendar command. The current implementation of the /calendar command works more like a TODO list of exams/quizes/homeworks.

We should leave the /calendar command alone and let it stay simple

Question: Through these curl requests, were you able to bypass authentication?

The requests do not bypass authentication but include the negotiation for authentication. I somehow thought that this would be a manual process since the login procedure is so strange, but it ended up not being an issue.

Here is a suggested implementation that incorporates your ideas with many additional questions:

  1. Start a separate scheduling thread (using schedule maybe? Using cron would require cron as an os dependency which could be a bit awkward esp. with developers not on *nix.. how does WSL handle it?)
  2. Maybe at 7am - 5pm do refreshes hourly (How often does a lesson change occur?)
  3. Store the most recent json file in case the bot restarts and it can (We could also reformat the data from calendar into a class and export encode that) -> We could also store all of the refreshes so we could do some data analysis ;-) 1 day per file
  4. On refresh, take into account all the changes and update the message and ping the cohort if there has been a change

Let me know if I missed anything

PedroS235 commented 10 months ago

Start a separate scheduling thread (using schedule maybe? Using cron would require cron as an os dependency which could be a bit awkward esp. with developers not on *nix.. how does WSL handle it?)

There is no need for this. nextcord already has these tasks api, which serves as cron. You can set the interval at which a function gets called.

Maybe at 7am - 5pm do refreshes hourly (How often does a lesson change occur?)

Not sure if that is really needed. I think just a request per day is enough in the morning. I have personally never got a day where in the middle it has changed suddenly.

DerAndereJohannes commented 10 months ago

Ah awesome. That makes things quite a bit easier then. Thanks for the extra info, ill go write something and get back to you.