BroncoDirectMe / Backend

BroncoDirectMe's API for the Chrome Extension
https://broncodirect.me
9 stars 1 forks source link

[SECONDARY FEATURE] Pre-reqs standardization #74

Open GuyWhoCode opened 5 months ago

GuyWhoCode commented 5 months ago

User Story

As a developer, I want to standardize course pre-requisite information in a easy-to-parse format to render a course pre-req chart.

Technical Tasks

Acceptance Criteria

Note

The end goal of this task is in preparation to generate the below image for every single major.

Untitled

https://codesandbox.io/p/sandbox/busy-snow-5ss3hl?file=%2Findex.js

DanielPasion commented 4 months ago

Discord Name(s): .thedaniel, saltbagels

392781 commented 4 months ago

Why is ML necessary? Can't you query and regex all pre-reqs then build a graph?

GuyWhoCode commented 4 months ago

Why is ML necessary? Can't you query and regex all pre-reqs then build a graph?

@392781 Luciano recommended using ML because of all the edge cases that come with parsing all the pre-reqs. Although so far, the person assigned to this issue has not used ML.

Sample Edge Cases

CHM 123 or CHM 1220 ; and CHM 123L or CHM 1220L ; or concurrent enrollment in CHM 2010 .

AMM 360 or AMM 3600 ; and AMM 360L or AMM 3600L .

ACC 207 and 207A or ACC 2070 ; and CIS 101, CIS 1010 , or PCPT.

BIO 121/BIO 121L, BIO 122/BIO 122L, and BIO 123/BIO 123L; BIO 121/BIO 121L and BIO 122/BIO122L/BIO 1220C; BIO 121/BIO 121L/BIO 1210B and BIO 1220 / BIO 1220L ; or BIO 1210 / BIO 1210L and BIO 1220 / BIO 1220L .

marked01one commented 3 months ago

I don't think it's that difficult to use regex queries if you exclude support for 3-digit course names, right? Correct me if I'm wrong, but 3-digit course names were deprecated over 5 years ago.

DanielPasion commented 3 months ago

I didn't use regex but I just kept every String that started with a department name such as BIO or CS and preceeded with a number like 121 or 1210 and then parsed all of the / , and . from the string. I also got rid of all the deprecated courses from creating an array of all the current classes that exist and removing it from my scraped data if it wasn't a part of the other list. I manually checked it with about 10 differnet majors and it worked like 95% of the time

392781 commented 3 months ago

95% of the time it works 100% of the time.

DanielPasion commented 3 months ago

Spoken like a true PHD Statistics candidate.

marked01one commented 3 months ago

95% of the time it works 100% of the time. That's good enough for me lmao

392781 commented 3 months ago
\w{2,3}\s\d{3,4}\w?
###################
\w{2,3}  # grabs alphabetical letters of length 2 or 3 (course major)
\s       # grabs a space
\d{3,4}  # grabs 3 or 4 digit course number
\w?      # grabs 0 or 1 alphabetical letter for lab/discussion/activity

This would grab all 3 and 4 digit courses (including labs/discussions/activities). The issue from there would be to get the connective material in between... Would probably be best to parse in "levels" where you perform string split on different tokens... so first maybe split on ";" which will create a list of individual requirements then split on "and"/"or" to get down to fine grained requirements.

(Also not yet a candidate :^) but thank you)

392781 commented 3 months ago

Added a breakdown of regex