devsoc-unsw / circles

The open-source degree planner for UNSW students. Features an interactive drag-and-drop interface for easy term planning and automatic progression checking to help you stay on track for graduation.
https://circles.devsoc.app
Other
54 stars 10 forks source link

<br/> is written in plain text in some places #1093

Open amazhangwinz opened 9 months ago

amazhangwinz commented 9 months ago

Describe the bug In some courses, the word <br/> is in plain text.

Screenshots image

Thanks team!

imagine-hussain commented 9 months ago

Had a look in our code and its seems this isn't a mistype by us but just part of the data that the handbook returns.

It seems that the tag removal kicks in preprocessing but, data here is after the formatting step. The tag removal will happen for when these course conditions are shown from the condition side but not the course side.

Function for removting tags in backend/data/processors/conditions_preprocessing.py:191

def delete_HTML(processed: str) -> str:
    """Remove HTML tags"""
    # Will replace with a space because they sometimes appear in the middle of the text
    # so "and<br/>12 UOC" would turn into and12 UOC
    return re.sub("<[a-z]*/>", " ", processed, flags=re.IGNORECASE)

Anotehr TODO: if already has a space to the left or right, do not add extraneous spacing but replace with "".


FIX:

  1. Add this into courses_formatting and into programs_formatting BUT, using \n chars instead to not break readability :)

Other ref of source data:

See in backend/scrapers/coursesFormattedRaw.json:

    "MARK3088": {
        "title": "Product Analytics",
        "code": "MARK3088",
        "UOC": "6",
        "gen_ed": "true",
        "level": "3",
        "description": "<p>Today\u2019s data-rich environment and advances in data mining techniques have enabled product idea generation from the crowd. Many innovative data-based products or services development and effective marketing of new product ideas are being born in crowdfunding platforms. Today, &#34;data\u201d itself may form part of the \u201ccore material\u201d of new products or services. This course integrates the principles of product development with machine learning techniques by covering text and sentiment analysis to analyse social media posts, product reviews or start-ups campaign on crowdfunding platforms, and data product or service development such as recommendation algorithms. Students will exercise hands-on data analytics to develop and test the machine learning models and conduct exploratory product data analysis and visualisation.</p>",
        "study_level": "Undergraduate",
        "school": "School of Marketing",
        "faculty": "UNSW Business School",
        "campus": "Sydney",
        "terms": "Term 1, Term 2",
        "calendar": "3+",
        "field_of_education": "080505 Marketing",
        "attributes": [
            {
                "type": "general_education",
                "description": "This course is available as <a href=\"https://www.student.unsw.edu.au/general-education\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">general education</a> and normally taken outside the study area in which the student\u2019s program is based. Availability of general education courses outside of the owning Faculty may be restricted by the Program Authority, usually because they are closely related to the study area of the student\u2019s program."
            }
        ],
        "equivalents": {},
        "exclusions": {},
        "enrolment_rules": "Pre-requisite: ECON1203 or COMM1190 or INFS1609 or MATH1041 or MATH1231 or MATH1241 or MATH1251 or MARK2052 or COMM2050/COMM3050 or COMM2501 or INFS2605 or INFS2609.<br/>Students with equivalent Statistics knowledge can seek pre-requisite waiver via webforms<br/><br/>"
    },
imagine-hussain commented 9 months ago

Technically my tenure is over but, will try to get a fix up :)