Open umutucak opened 1 year ago
Hi. Over the past hour I have been playing around with the authentication process for the guichet etudiant website and have figured out how we could do this without any additional heavy dependencies. While selenium headless browsers work, it seemed too unfortunate to use this as it would add much bulk. I have managed to achieve it using curl requests (it was actually easier than I thought). I am sure that these requests can easily be translated to python requests.
Using these basic requests, you are able to directly curl the json files that the website itself uses for displaying the information rather than scraping the website to retrieve the information. Can you add additional information as to how this project usually stores useful data and any commands that should be created to access the calendar information? It is not quite clear to me.
@DerAndereJohannes oh that's good news. I guess then indeed it will make our life easier. But the main idea is to have a chat, which I believe it already exists and send a message weekly or maybe daily of the courses of each semester, by year. So in the winter, semester 1,3,5 would be shown in their respective channels. and then for the summer semesters the same.
so I guess there is no need for commands, but it would be more of a task/loop that every 24h or weekly it would fetch new updates if any to the schedule. I guess weakly makes more sense? @umutucak what do you think?
@DerAndereJohannes that is really cool!
Info could be stored in a new directory db
that I've been thinking if introducing for a while in the root of the project. We can either (a) store the values in JSON (or whatever) format in this directory, or (b) make requests around 7h, process the data, check for diff, and update the message accordingly without any local data storage. @PedroS235 I'm a fan of doing daily updates especially now that we do not have the heavy headless browser overhead. Calendar updates on the side of the uni are done throughout the week, thus it is only natural for us to do daily updates too I think.
This feature is a new and fully independent feature, not related to the already present /calendar
command. The current implementation of the /calendar
command works more like a TODO list of exams/quizes/homeworks.
The current /calendar
command is not used, it could be reworked into a new name, and this new calendar integration could be the new titular Calendar feature of the bot.
On the server we already have 3 calendar channels; they are in the year1/2/3 categories. They can be made read-only and be restructured into this bot-managed system. The bot can edit it's own messages in these channels.
Question: Through these curl requests, were you able to bypass authentication?
The requests are immediate and can of course be done in async / on a separate thread. There is no need to be overly conservative on sending requests ( of course do not ddos ;-) ). Here are some thoughts on other aspects:
On the server we already have 3 calendar channels; they are in the year1/2/3 categories. They can be made read-only and be restructured into this bot-managed system. The bot can edit it's own messages in these channels.
This sounds like the most logical use for this issue.
... check for diff, ...
Diff checking might just be superfluous if you just generate the whole message again from the new data and update the message accordingly (unless you want to also create a new message pinging the cohort about the change which might be nice)
This feature is a new and fully independent feature, not related to the already present /calendar command. The current implementation of the /calendar command works more like a TODO list of exams/quizes/homeworks.
We should leave the /calendar command alone and let it stay simple
Question: Through these curl requests, were you able to bypass authentication?
The requests do not bypass authentication but include the negotiation for authentication. I somehow thought that this would be a manual process since the login procedure is so strange, but it ended up not being an issue.
Here is a suggested implementation that incorporates your ideas with many additional questions:
Let me know if I missed anything
Start a separate scheduling thread (using schedule maybe? Using cron would require cron as an os dependency which could be a bit awkward esp. with developers not on *nix.. how does WSL handle it?)
There is no need for this. nextcord already has these tasks api, which serves as cron. You can set the interval at which a function gets called.
Maybe at 7am - 5pm do refreshes hourly (How often does a lesson change occur?)
Not sure if that is really needed. I think just a request per day is enough in the morning. I have personally never got a day where in the middle it has changed suddenly.
Ah awesome. That makes things quite a bit easier then. Thanks for the extra info, ill go write something and get back to you.
Is your feature request related to a problem? Please describe.
We have the calendar available to all students on Guichet Etudiant. However, accessing it on-the-go is not the most convenient thing (from personal experience). It would be quicker to have it at hand on the Discord server.
Describe the solution you'd like
We can scrape the calendar page on Guichet Etudiant and parse the HTML to get the updated calendar. This can be automated with a script scheduled to run weekly.
Describe alternatives you've considered
We can download the
.ical
file available on Guichet Etudiant. I am not sure if this can be done via scraping or another approach, or whether it is possible to automate at all.Additional context Below are some example snippets of the HTML that we can scrape.
Snippet 1: Course Information
The red box highlights the
<td>
bodies. These represent the week days from Sunday to Monday. So in this example, the third<td>
having content indicates that on Tuesday there is a class.The green box highlights what that class is. We can see information such as the time and name of this class.
This consistent formatting can be used to automate this task using scripts.