KaappoRaivio / wilmacrawler

2 stars 0 forks source link

ValueError: not enough values to unpack (expected 2, got 1) #2

Open Sukarth opened 2 years ago

Sukarth commented 2 years ago

Hello @KaappoRaivio , Thanks for making this api, it will help me a lot in my project, if I get past this error. Could you please help me?

I had to make some changes to the code because there were other errors. anyways I sorted them out because I could find info online and could understand them. I unfortunately can't understand this one.

I have added some print commands in the code to get basic logging as you can see below. (I have only pasted part of the logs below)

Käsityö.2 TBA B121
No hw or diary found!
Käsityö.2 TBA B121
No hw or diary found!
Matematiikka VUU B323
No hw or diary found!
Ruotsin kieli, B1 TLA B327
No hw or diary found!
Espanja, A2.2 lile B145
No hw or diary found!
/usr/local/lib/python3.8/dist-packages/selenium/webdriver/remote/webelement.py:359: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
  warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")
sHImai.Etäopetus YLÄ ma 16.00 AGS
No hw or diary found!
Historia TJW B362
No hw or diary found!
Musiikki.2 EJO B143
No hw or diary found!
Englanti, A1 AMI B342
No hw or diary found!
Musiikki.2 EJO B143
No hw or diary found!
Suomi toisena kielenä ja kirjallisuus.1 VRI
No hw or diary found!
Espanja, A2.2 lile C117
No hw or diary found!
Kemia.3 TEL B304
No hw or diary found!
Kemia.3 TEL B304
No hw or diary found!
Biologia MMO B366
No hw or diary found!
Kuvataide.2 VNA B164
No hw or diary found!
Kuvataide.2 VNA B164
No hw or diary found!
Kotitalous.3 SKL A116
No hw or diary found!
Kotitalous.3 SKL A116
No hw or diary found!
Kotitalous.3 SKL A116
No hw or diary found!
Matematiikka VUU A172
No hw or diary found!
Liikunta PBO Gym1
No hw or diary found!
Liikunta PBO Gym1
No hw or diary found!
Suomi toisena kielenä ja kirjallisuus.1 VRI
No hw or diary found!
Ruotsin kieli, B1 TLA B329
No hw or diary found!
Matematiikka VUU B145
No hw or diary found!
Terveystieto JWI B145
No hw or diary found!
Englanti, A1 AMI B342
No hw or diary found!
Suomi toisena kielenä ja kirjallisuus.1 VRI
No hw or diary found!
Historia TJW B362
No hw or diary found!
UKR.7 PLT Espoo International
No hw or diary found!

Exception happened during processing of request from ('127.0.0.1', 43436)
Traceback (most recent call last):
  File "/usr/lib/python3.8/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.8/socketserver.py", line 347, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python3.8/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.8/socketserver.py", line 747, in __init__
    self.handle()
  File "/usr/lib/python3.8/http/server.py", line 427, in handle
    self.handle_one_request()
  File "/usr/lib/python3.8/http/server.py", line 415, in handle_one_request
    method()
  File "src/server.py", line 59, in do_GET
    _json = get_data(username, password)
  File "src/server.py", line 41, in get_data
    schedule = crawler.get_schedule()
  File "/mnt/c/Users/user/downloads/wilmacrawler-master/src/Crawler.py", line 119, in get_schedule
    return Schedule(courses, range, details)
  File "/mnt/c/Users/user/downloads/wilmacrawler-master/src/Crawler.py", line 273, in __init__
    palkki, title = course.split(": ")
ValueError: not enough values to unpack (expected 2, got 1)

All the code in file Crawler.py:

import datetime
import json
import re
import time
from dataclasses import dataclass
from typing import Counter

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

import transform_ugly

FRIDAY = 4
MONDAY = 0
Cyounter = 0

class Crawler:
    def __init__(self, driver, debug=False, url="https://espoo.inschool.fi"):
        self.driver = driver
        self.id = 0

        self.driver.get(url)
        self.driver.set_window_size(960, 1080)
        self.debug = debug

    def login(self, username, password):
        time.sleep(1)
        usernameField = WebDriverWait(self.driver, 30).until(EC.presence_of_element_located((By.ID, "login-frontdoor")))
        usernameField.clear()
        time.sleep(0.1)
        usernameField.send_keys(username)

        time.sleep(0.1)

        passwordField = self.driver.find_element_by_id("password")
        passwordField.clear()
        passwordField.send_keys(password)

        time.sleep(0.1)

        elem = self.driver.find_element_by_name("submit")
        elem.send_keys(Keys.RETURN)

    @staticmethod
    def __filter_duplicates(elements):
        seen = set()
        def iteration (item):
            if item.text in seen:
                return False
            else:
                seen.add(item.text)
                return True

        return list(filter(iteration, elements))

    def get_schedule(self):
        global Cyounter 
        Cyounter = Cyounter + 1
        self.driver.get("https://espoo.inschool.fi/!02270224/schedule")
        next = WebDriverWait(self.driver, 30).until(EC.presence_of_element_located((By.CLASS_NAME, "vismaicon-arrow-right-circle")))
        next.click()
        WebDriverWait(self.driver, 30).until(EC.presence_of_element_located((By.CLASS_NAME, "info")))
        # lessons = WebDriverWait(self.driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "info")))
        range = self.__get_date_range()
        global lessons
        lessons = self.driver.find_elements_by_class_name("info")
        # print("details" + range)
        print('this is lesson:' .join(map(str, lessons)) + "/n /n")
        print('')
        print('')
        print('')
        print('')
        print('')
        print('')
        print('')
        print('')
        print('')
        if Cyounter == 1:
            del lessons[3]
            del lessons[11]
            del lessons[16]
            del lessons[22]
            del lessons[28]
            # lessons.pop(3)
            # lessons.pop(11)
            # # lessons.pop(16)
            # # lessons.pop(17)
            # lessons.pop(18)
            # # lessons.pop(19)
            # lessons.pop(24)
            # lessons.pop(31)
        else:
            pass

        print('this is lesson:' .join(map(str, lessons)) + "/n /n")
        print('')
        print('')
        print('')
        print('')
        print('')
        print('')        
        print('')
        print('')
        print('')
        print('')
        print('')
        print('')
        # lessons = self.__filter_duplicates(lessons)
        details = list(map(self.__get_lesson_details, lessons))
        courses = set(map(lambda lesson: lesson.find_element_by_tag_name("a").text, lessons))

        # print("this is courses" + courses)
        # print("details" + details)
        # print(courses, range, details)
        return Schedule(courses, range, details)

    def __get_lesson_details(self, lesson_element ):
        # print("lesson_element-----------------------")
        # print(lesson_element) 
        # print("lesson_element-----------------------")  
        # cyounter = 0
        # try:
        # if lessons.find_element_by_tag_name("a").text == 'Käsityö.2' and cyounter == 0:

            # cyounter = cyounter+1

        course_title = lesson_element.find_element_by_class_name("no-underline-link")
        teacher = lesson_element.find_element_by_class_name("teachers")
        room = lesson_element.find_element_by_class_name("rooms")
        # line below this was originally commented...
        print(course_title.text, teacher.text, room.text)
        details = self.__get_course_details(course_title)
        # cyounter = 0

        return course_title.text, teacher.text, room.text, details

        # else: 
        #     pass

        # if lesson_element.find_element_by_tag_name("a").text == 'Käsityö.2' and cyounter == 2:

        #     course_title = lesson_element.find_element_by_class_name("no-underline-link")
        #     teacher = lesson_element.find_element_by_class_name("teachers")
        #     room = lesson_element.find_element_by_class_name("rooms")
        #     # line below this was originally commented...
        #     print(course_title.text, teacher.text, room.text)
        #     details = self.__get_course_details(course_title)
        #     cyounter = 0

        #     return course_title.text, teacher.text, room.text, details            
        # else:
        #     pass

        # except:
        #     pass

    def __get_course_details(self, course_element):
        # time.sleep(5)
        course_element.click()
        # time.sleep(5)
        self.driver.switch_to.window(self.driver.window_handles[-1])
        tables = WebDriverWait(self.driver, 5).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "table")))

        # print(tables)

        homework = []
        lesson_diary = []

        for table in tables:
            try:
                headers = list(map(lambda a: a.text, table.find_element_by_tag_name("thead").find_element_by_tag_name("tr").find_elements_by_tag_name("th")))
                # print(headers)

                if headers[1].lower().strip() == "kotitehtävät":
                    print("Homework found!")
                    # homework
                    entries = table.find_element_by_tag_name("tbody").find_elements_by_tag_name("tr")

                    for entry in entries:
                        date, exercises = map(lambda x: x.text, entry.find_elements_by_tag_name("td"))
                        homework.append({"date": date, "exercises": exercises})

                elif headers[1].lower().strip() == "tuntinro":
                    print("Diary found!")
                    # lesson diary
                    entries = table.find_element_by_tag_name("tbody").find_elements_by_tag_name("tr")
                    for entry in entries:
                        date, lesson_number, lesson_topic, teacher = map(lambda x: x.text, entry.find_elements_by_tag_name("td"))
                        lesson_diary.append({"date": date, "lesson_number": lesson_number, "lesson_topic": lesson_topic, "teacher": teacher})

            except:
                print("No hw or diary found!")
                continue

        # self.driver.navigate().back()
        self.driver.close()
        self.driver.switch_to.window(self.driver.window_handles[0])

        return homework, lesson_diary

    def __get_date_range(self):
        dates = self.driver.find_element_by_class_name("weekday-container").find_elements_by_class_name("weekday")
        # print(dates)
        return list(map(lambda x: x.text, dates))

    def __enter__(self):
        return self

    def __exit__(self, a, s, d):
        if not self.debug:
            self.driver.close()

def get_credentials(path="credentials"):
    with open(path) as file:
        return file.readlines()

@dataclass
class Lesson:
    nth_lesson: int
    day_of_week: int

def p(time_str):
    h, m = time_str.split(":")
    return 3600 * int(h) + 60 * int(m)

tuntikiertokaavio = {
    1: (Lesson(3, 1), Lesson(3, 3), Lesson(0, 4)),
    2: (Lesson(0, 0), Lesson(1, 2), Lesson(2, 3)),
    4: (Lesson(2, 1), Lesson(0, 2), Lesson(1, 4)),
    5: (Lesson(2, 0), Lesson(1, 1), Lesson(1, 3)),
    6: (Lesson(3, 0), Lesson(3, 2), Lesson(2, 4)),
    7: (Lesson(0, 1), Lesson(0, 3), Lesson(3, 4)),
    8: (Lesson(4, 0), Lesson(4, 2), Lesson(4, 4)),
}

def highlight(string):
    # return f"\u001b[40m\u001b[37;1m{string}\u001b[0m"
    return f"\u001b[1m{string}\u001b[0m"

class Schedule:
    lessonstarts = {
        0: " 8.30– 9.45 ",
        1: "10.00–11.15 ",
        2: "11.20–13.15 ",
        3: "13.30–14.45 ",
        4: "15.00–16.15 "
    }

    def __init__(self, courses, dates, details):
        self.schedule = [["" for weekday in range(5)] for nth_lesson in range(5)]
        self.dates = self.__parse_dates(dates)
        self.details = details
        for course in courses:
            # palkki = re.split(': ', course)[0]
            # title = re.split(': ', course)[1]
            palkki, title = course.split(": ")
            # palkki, title = str.split(": ", course)

            for timestamp in tuntikiertokaavio[int(palkki)]:
                self.schedule[timestamp.nth_lesson][timestamp.day_of_week] = title

    def __str__(self):
        highlight_date = self.get_highlight_date()

        first_row = [12 * " ", *[self.__pad(date) for date in self.dates]]
        first_row[highlight_date + 1] = highlight(first_row[highlight_date + 1])
        lines = [" ".join(first_row)]
        lines.append(12 * " " + ("+" + "-" * 10) * 5)
        # lines = []
        for index, row in enumerate(self.schedule):
            line = []
            line.append(self.lessonstarts[index])
            line.append("|")
            new_row = [self.__pad(item) for item in row]
            new_row[highlight_date] = highlight(new_row[highlight_date])
            line.append("|".join(new_row))
            lines.append("".join(line))

        return "\n".join(lines)

    def __pad(self, string):
        return string.rjust(9, " ") + " "

    def get_highlight_date(self):
        day_of_week = datetime.date.today().weekday()
        if day_of_week > FRIDAY:
            day_of_week = MONDAY

        return day_of_week

    weekdays = ("Ma", "Ti", "Ke", "To", "Pe")

    def __parse_dates(self, dates):
        today = datetime.date.today()
        dateobjects = []
        for date in dates:
            weekday, daymonth = re.sub(r"\.$", "", date).split(" ")
            day, month = daymonth.split(".")

            if today.weekday() > FRIDAY:
                dateobjects.append(datetime.date(today.year, int(month), int(day)) + datetime.timedelta(days=7))
            else:
                dateobjects.append(datetime.date(today.year, int(month), int(day)) + datetime.timedelta(days=0))

        return [self.weekdays[index] + " " +  x for index, x in enumerate(list(map(lambda x: x.strftime("%d.%m"), dateobjects)))]

if __name__ == "__main__":
    options = Options()
    # options.add_argument("-headless")
    driver = webdriver.Firefox(options=options)
    with Crawler(driver, debug=True) as crawler:
        crawler.login(*get_credentials())
        time.sleep(1)
        schedule = crawler.get_schedule()
        print(json.dumps(transform_ugly.transform(schedule), indent=4, sort_keys=True), file=open("out.json", "w"))
        # print(s, s.details)

Please reply and tell me if you need more information to debug this problem. Could you please tell me what is happening? Looking forward to your response.

Sukarth commented 2 years ago

Hello, I would appreciate your reply since I am still struggling with the problem. @KaappoRaivio

KaappoRaivio commented 2 years ago

Hello, thanks for your interest! I'll look into your issue within a couple of days.

K

KaappoRaivio commented 2 years ago

It just seems that you school uses a different naming convention for the classes. Originally this was only for my personal use, so I didn't anticipate usage outside my own school, where all class names are of form <number>: ABC1.23.

Unfortunately, at this time I have no interest in maintaining this project any further, since it would need a major rewrite. Your issue seems relatively simple to fix.


class Schedule:
    ...

    def __init__(self, courses, dates, details):
        self.schedule = [["" for weekday in range(5)] for nth_lesson in range(5)]
        self.dates = self.__parse_dates(dates)
        self.details = details
        for course in courses:
##            palkki, title = course.split(": ") # instead of this, do
            if ": " in course:
                palkki, title = course.split(": ")
            else:
                # You have to pull the information from elsewhere. I'm not sure what would be the best course of action. Probably depends on the overall structure and conventions your school uses. 
                #Btw if you are confused, "palkki" is one of the eight slots where lessons are held in Espoo

            for timestamp in tuntikiertokaavio[int(palkki)]:
                self.schedule[timestamp.nth_lesson][timestamp.day_of_week] = title

And yes, I cringe a little at the fact that half of the variables are in Finnish, and half in English :DD

P.s. you might want to consider redacting identifying information from github issues in the future. At the moment I can see who your teachers are and which school you go to :)

Sukarth commented 2 years ago

Thanks for responding....

Your reply was really helpful (with the suggestion of redacting identifying information... ), but I am still a bit confused with what that piece of code is supposed to do, since I don't know a lot of python, I tried searching it up, but couldn't find any good answers. Would you mind telling me what is supposed to do what in the code you posted in the above comment?

Sukarth commented 2 years ago

Hi! Could you please help? @KaappoRaivio @KaappoRaivio

Sukarth commented 2 years ago

Hello @KaappoRaivio @KaappoRaivio @KaappoRaivio, I made some changes and run the program a couple of times, and I get the name of the subjects as a response, but not the homework of each subject.

These are the last few lines in the terminal window that was running the server:

{'Käsityö.2', 'Kuvataide.2', 'Ruotsin kieli, B1', 'Liikunta', 'Kotitalous.3', 'Biologia', 'UKR.7 OGQ', 'Musiikki.2', 'sHImai.Etäopetus 5-9 ma 16.00', 'Historia', 'Englanti, A1', 'Kemia.3', 'Matematiikka', 'Suomi toisena kielenä ja kirjallisuus.1', 'Espanja, A2.2', 'Terveystieto'}
0
1
2
3
4
127.0.0.1 - - [09/Feb/2022 19:36:59] "GET / HTTP/1.1" 200 -

And as a response from where I sent the request to the server, I get this:

{"courses": {}, "teachers": {}, "upcoming": []}

I think I know the problem, but don't know how to fix it. @KaappoRaivio First answering my above posts would help me understand the code and fix the problem. Thanks @KaappoRaivio in advance. Hope I get a response soon.