Navidur1 / SkuleSpark.ai

0 stars 0 forks source link

Skule scraper #22

Closed arafatsyed closed 9 months ago

arafatsyed commented 9 months ago

Skule Exams Scraper Script

Instructions to Run:

  1. Create Directory Structure:

    • Create a directory for the script.
    • Recommend to organize classes within subfolders (e.g., 'skule_exams/ECE552'). Will make it easier for whoever's running it.

    Directory Structure

  2. Create pdf_url.txt File:

    • Create a file named pdf_url.txt in the directory you want to run the script. In this case I want to run it in skule_exams/ECE552 so I will create it there.
    • Add Skule.ca URLs of the exams to be scraped in this file.

    pdf_url.txt Example

  3. Run the Script:

    • Execute the script using the directory containing pdf_url.txt and the course code.
    • Use the command: python skule_scraper_service.py <directory_name> <course_code> (Ensure accuracy in the directory name and course code to avoid database purging.)

    image

    Script Execution Output

  4. View Parsed Files:

    • Successfully parsed files will be in a folder named parsed_files.
    • Utilize this folder to check for any failed files for error checking.

    Parsed Files Folder

Related #4