aarctan / schedubuddy-server

https://schedubuddy.com/
MIT License
4 stars 1 forks source link

PSYCH 241 missing from Winter 2024 (1830) #14

Closed Mattwmaster58 closed 1 year ago

Mattwmaster58 commented 1 year ago

PSYCH 241 is offered in winter of 2024, is scraped to raw.json properly, but fails to be recorded in the uOfACourse course table for whatever reason. write_raw.py does insert the class at some point, but there's a delete statement at the end that removes it. Running this query based on the delete query shows that over 30% of of uofacourse records are being removed:

SELECT (COUNT(*))*1.0/(SELECT COUNT(*) from uOfACourse)
FROM uOfACourse
WHERE course IN
      (SELECT uOfACourse.course
       FROM uOfACourse
                LEFT JOIN uOfAClass
                          ON uOfACourse.course = uOfAClass.course AND uOfACourse.term = uOfAClass.term
       WHERE uOfAClass.course IS NULL)
0.33053

So ultimately, the issue no insertion is being made into uOfAClass, which leads to its deletion?

Mattwmaster58 commented 1 year ago

Also happening with MUSIC 100

Mattwmaster58 commented 1 year ago

Looks like this is the issue:

            for curr_embed in re.findall(r"\d+-\d+-\d+ - \d+-\d+-\d+.*?\)", em):
                potentially_biweekly = False
                start_date, end_date = re.findall("\d+-\d+-\d+", curr_embed)
                days, start_t, _, end_t = re.findall("\w+ \d+:\d+ - \d+:\d+", curr_embed)[0].split(' ')
                if curr_embed.find(")") == -1:
                    curr_embed += ')'
                location = curr_embed[curr_embed.find("(") + 1: curr_embed.find(")")]
                location = location if location != "TBD" else 

The regex on the first line here expects a course timing to end in a close paren, but online courses don't have a location in parentheses so this breaks (so why is there a codepath for a paren that doesn't exist when it will always exist? ;)