anilabhadatta / educative.io_scraper

Educative.io Course Downloader developed using Python and Selenium. Refer Readme.md for setup instructions.
MIT License
167 stars 55 forks source link

utf8 Encoding/Decoding error #48

Closed taranjeetsingh257 closed 9 months ago

taranjeetsingh257 commented 1 year ago

I am getting this error in my terminal(after successfully logging in and having a config file): Exception, Driver exited 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Main Exception local variable 'driver' referenced before assignment Press Enter to continue

Do you know why I am getting this? I cloned the latest updated repo

anilabhadatta commented 1 year ago

show to full terminal output show me the process of creating config

taranjeetsingh257 commented 1 year ago

@anilabhadatta

                    Educative Scraper (version 8.5), developed by Anilabha Datta
                    Project Link: https://github.com/anilabhadatta/educative.io_scraper
                    Please go through the ReadMe for more information about this project.

                    Press 1 and Enter to generate config
                    Press 2 and Enter to select a config [Currently selected config 0]
                    Press 3 and Enter to login Educative
                    Press 4 and Enter to start scraping
                    Press Enter to exit

Enter your choice: 1

    Leave Blank and Press Enter if you don't want to overwrite Previous Values

Enter the URL text file path: /Users/t/Downloads/educative.io_scraper-master/urls.txt Enter Save Path: /Users/t/Downloads/educative.io_scraper-master/n Headless T/F? T

                    Educative Scraper (version 8.5), developed by Anilabha Datta
                    Project Link: https://github.com/anilabhadatta/educative.io_scraper
                    Please go through the ReadMe for more information about this project.

                    Press 1 and Enter to generate config
                    Press 2 and Enter to select a config [Currently selected config 0]
                    Press 3 and Enter to login Educative
                    Press 4 and Enter to start scraping
                    Press Enter to exit

Enter your choice: 3

Driver Loaded Press Enter to return to Main Menu after Login is successfull Login Success!

                    Educative Scraper (version 8.5), developed by Anilabha Datta
                    Project Link: https://github.com/anilabhadatta/educative.io_scraper
                    Please go through the ReadMe for more information about this project.

                    Press 1 and Enter to generate config
                    Press 2 and Enter to select a config [Currently selected config 0]
                    Press 3 and Enter to login Educative
                    Press 4 and Enter to start scraping
                    Press Enter to exit

Enter your choice: 4

            Scraper Started, Log file can be found in Save directory

Exception, Driver exited 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Main Exception local variable 'driver' referenced before assignment Press Enter to continue

anilabhadatta commented 1 year ago

@taranjeetsingh257 what are you pasting in your url.txt file?

anilabhadatta commented 1 year ago

@taranjeetsingh257 you are able to open the browser and login right?

taranjeetsingh257 commented 1 year ago

@anilabhadatta https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/xl7XB03xZ4J

This is the content of my url.txt file

Yes, I am able to open the browser and login

taranjeetsingh257 commented 1 year ago

Any update? @anilabhadatta

anilabhadatta commented 1 year ago

@taranjeetsingh257 Show me screenshot after running the scraper . Full terminal output is required

anilabhadatta commented 1 year ago

@taranjeetsingh257 This is my output, No problems in my end.

            Scraper Started, Log file can be found in Save directory

Driver Loaded

                        [Selected config: 0] Starting Scraping: 0, https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/xl7XB03xZ4J

Load Webpage Function Checking Login Function Checking for captcha Function... Create Course Folder Function Getting File Name

This is a module page Get File name module Inside Course Folder operating systems virtualization concurrency persistence Checking Login Function Checking for captcha Function... Scrolling Page Getting File Name This is a module page Get File name module Checking page Removing Unnecessary Tags from page Node deleted div[class='ed-grid'] > nav Node deleted div[aria-label='Your Privacy'] Node deleted div[id='view-collection-article-content-root']> :not(#handleArticleScroll) > Node deleted div[aria-labelledby*='simple-modal-title'] Adding Style Tag with Filter Inside find_mark_down_quiz_containers function No mark down quiz_container found Inside take_quiz_screenshot function Quiz not found Remove Mark completed Show Solution Function No Solution found Finding Slides Function Slides Found Show Hints Function No hints found Adding Name Tag in Next Back Button Fixing SVG Tags inside Object Tags Get HTML Page Content Using Single File Function make_code_selectable function make_code_selectable function executed Creating HTML File HTML File Created HTML Page content taken. Inside Widget Container Function No widget container found Code Container Download Type Function No Code Container Downloadable Type found Code Container Clipboard Type Function No code containers found Next Page Function Going Next Page --------------- 0 Complete-------------------

anilabhadatta commented 1 year ago

@taranjeetsingh257 I think you are not starting the chromedriver probably. open a separate terminal and run python chromedriver.py I have mentioned them in Readme. Follow the steps once

anilabhadatta commented 1 year ago

@taranjeetsingh257 v3 released, reclone to a new folder and refer readme for setup instructions