NIAEFEUP / uporto-schedule-scrapper

Python solution to extract the courses schedules from the different faculties of UPorto. Used to feed our timetable selection platform for students, TTS.
GNU General Public License v3.0
3 stars 2 forks source link

Handle course units that can be in several courses at once #44

Closed miguelpduarte closed 1 year ago

miguelpduarte commented 4 years ago

For example, MSIN (https://sigarra.up.pt/feup/pt/ucurr_geral.ficha_uc_view?pv_ocorrencia_id=436845) is in both MIEEC and MI:EF, but only appears in MI:EF.

Did not dive very deep into the issue yet, but I am fairly certain that this must be a problem in DB constraints and not in the scraping itself (although it might have to be adjusted in order to allow this relation to happen).

When the DB was conceptualized, the course units were probably only in a many-to-one relationship with the courses, and not in a many-to-many one as it appears that sometimes they are.

However, messing with the DB schema will be problematic, as this implies also touching the API most likely.

We should do so, anyway, if the transfer to use SIGARRA's API is not successful, as there is some technical debt in the DB that should be addressed - namely the fact that an internal id is used for cross-referencing tables instead of SIGARRA's ids (which makes DB truncation necessary whenever a new scraping occurs in order to have the ids make sense :upside_down_face: - see #30). What would be best would be to use SIGARRA's ids and always use operations that would insert or update the data if it already exists.

bdmendes commented 1 year ago

@Jumaruba is this fixed now?

Jumaruba commented 1 year ago

We've fixed it 🌞