Open amazhangwinz opened 1 year ago
Had a look in our code and its seems this isn't a mistype by us but just part of the data that the handbook returns.
It seems that the tag removal kicks in preprocessing but, data here is after the formatting step. The tag removal will happen for when these course conditions are shown from the condition side but not the course side.
Function for removting tags in backend/data/processors/conditions_preprocessing.py:191
def delete_HTML(processed: str) -> str:
"""Remove HTML tags"""
# Will replace with a space because they sometimes appear in the middle of the text
# so "and<br/>12 UOC" would turn into and12 UOC
return re.sub("<[a-z]*/>", " ", processed, flags=re.IGNORECASE)
Anotehr TODO: if already has a space to the left or right, do not add extraneous spacing but replace with ""
.
FIX:
courses_formatting
and into programs_formatting
BUT, using \n
chars instead to not break readability :)Other ref of source data:
See in backend/scrapers/coursesFormattedRaw.json
:
"MARK3088": {
"title": "Product Analytics",
"code": "MARK3088",
"UOC": "6",
"gen_ed": "true",
"level": "3",
"description": "<p>Today\u2019s data-rich environment and advances in data mining techniques have enabled product idea generation from the crowd. Many innovative data-based products or services development and effective marketing of new product ideas are being born in crowdfunding platforms. Today, "data\u201d itself may form part of the \u201ccore material\u201d of new products or services. This course integrates the principles of product development with machine learning techniques by covering text and sentiment analysis to analyse social media posts, product reviews or start-ups campaign on crowdfunding platforms, and data product or service development such as recommendation algorithms. Students will exercise hands-on data analytics to develop and test the machine learning models and conduct exploratory product data analysis and visualisation.</p>",
"study_level": "Undergraduate",
"school": "School of Marketing",
"faculty": "UNSW Business School",
"campus": "Sydney",
"terms": "Term 1, Term 2",
"calendar": "3+",
"field_of_education": "080505 Marketing",
"attributes": [
{
"type": "general_education",
"description": "This course is available as <a href=\"https://www.student.unsw.edu.au/general-education\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">general education</a> and normally taken outside the study area in which the student\u2019s program is based. Availability of general education courses outside of the owning Faculty may be restricted by the Program Authority, usually because they are closely related to the study area of the student\u2019s program."
}
],
"equivalents": {},
"exclusions": {},
"enrolment_rules": "Pre-requisite: ECON1203 or COMM1190 or INFS1609 or MATH1041 or MATH1231 or MATH1241 or MATH1251 or MARK2052 or COMM2050/COMM3050 or COMM2501 or INFS2605 or INFS2609.<br/>Students with equivalent Statistics knowledge can seek pre-requisite waiver via webforms<br/><br/>"
},
Technically my tenure is over but, will try to get a fix up :)
Describe the bug In some courses, the word
<br/>
is in plain text.Screenshots
Thanks team!