Nobelz / RateMyProfessorAPI

Python web scraper to get professor ratings from ratemyprofessor.com website.
Apache License 2.0
39 stars 11 forks source link

Inaccurate ratings of Professors with legacy floating point ratings. #19

Closed ileka2468 closed 6 months ago

ileka2468 commented 6 months ago

The professor.get_ratings() method returns inaccurate rounded down ratings for professors who have floating point value reviews.

Details: These floating point reviews get wonky translated values when queried through the API. In the picture below the actual review for ECT584 is a 2.5, but in the API it gets a value of 1. Quite a breaking issue when trying to obtain the rating distribution.

image

image

what code produces:

[('ALLCLASES', [3]), ('CS521', [3]), ('CSC200', [5]), ('CSC210', [4]), ('CSC478', [3, 2, 2]), ('CSC480', [2, 4]), ('CSC575', [1, 5]), ('DS575', [4]), ('DSC478', [1, 4]), ('ECT584', [5, 1, 5, 5, 5, 5]), ('HON207', [3]), ('IT130', [3]), ('LSP110', [3, 4])] Bamshad Mobasher {'5 stars': 7, '4 stars': 5, '3 stars': 6, '2 stars': 3, '1 star': 3}

I even went priitive with the code to make sure I wasnt being dumb b/c my dictionary implementation produced incorrect ratings so i swapped to regualr if statement which still produced the wrong distribution, then I looked at the API code only to find that the rating is of type int and its rounding things differntly than the website does.

prof = rmp.Professor(582550)
        course_ratings = [(course.name, [rating.rating for rating in prof.get_ratings(course.name)]) for course in prof.courses]
        print(course_ratings)

        course_ratings = [[rating.rating for rating in prof.get_ratings(course.name)] for course in prof.courses]
        for ratings in course_ratings:
            for rating in ratings:
                if rating == 1:
                    one_count += 1
                elif rating == 2:
                    two_count += 1
                elif rating == 3:
                    three_count += 1
                elif rating == 4:
                    four_count += 1
                elif rating == 5:
                    five_count += 1

        print(prof.name, {"5 stars": five_count, "4 stars": four_count, "3 stars": three_count, "2 stars": two_count, "1 star": one_count})
ileka2468 commented 6 months ago

This is me realizing the ratings don't tie directly to the professors actual rating distribution. Bummer that theres now way to get the rating distro for the professor (at least i dont think) so I extended the API with this nonsense to get the rating distro for a professor.


    def _get_distro(self, professor_id: int):
        url = "https://www.ratemyprofessors.com/graphql"
        headers = {
            "Content-Type": "application/json",
            "Referer": f"https://www.ratemyprofessors.com/ShowRatings.jsp?tid={professor_id}",
            "Authorization": "lol",
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
        }

        query = """
        query GetTeacherDetails($id: ID!) {
          node(id: $id) {
            ... on Teacher {
              firstName
              lastName
              numRatings
              ratingsDistribution {
                r1
                r2
                r3
                r4
                r5
                total
              }
            }
          }
        }
        """

        encoded_id = base64.b64encode(f"Teacher-{professor_id}".encode()).decode()
        variables = {'id': encoded_id}

        response = requests.post(url, json={'query': query, 'variables': variables}, headers=headers)

        if response.status_code != 200:
            print(f"Failed to fetch data: {response.status_code}, {response.text}")
            return

        try:
            data = response.json()
            print(json.dumps(data, indent=4))
        except json.JSONDecodeError:
            print("Failed to decode JSON from response:", response.text)