NIAEFEUP / uporto-schedule-scrapper

Python solution to extract the courses schedules from the different faculties of UPorto. Used to feed our timetable selection platform for students, TTS.
GNU General Public License v3.0
4 stars 2 forks source link

University of Porto Scrapper - UPS!

Python solution to extract the courses schedules from the different Faculties of University of Porto.


If you don't have docker, you can use python locally in your machine or create a virtual environment. In this case make sure python version is >=3.8.

Local python

pip install -r ./src/requirements.txt   # Install dependencies

Virtual environment

python -m venv venv_scrapper            # Create virtual environment
source ./venv_scrapper/bin/activate        # Activate virtual environment
pip install -r ./src/requirements.txt   # Install dependencies

Quick start

:wrench: Configure

  1. Create a .env example from the .env.example file
cd src && cp .env.example .env
  1. Change the following fields in the .env file:

:dash: Run

docker-compose run scrapper make
# or 
cd ./src && make
docker-compose run scrapper make dump
# or 
cd ./src && make dump
docker-compose run scrapper make upload
# or 
cd ./src && make upload
docker-compose run scrapper make clean
# or
cd ./src && make clean

:mag: Inspect

To inspect the scrapy engine, use scrapy shell "url"


root@00723f950c71:/scrapper# scrapy shell ""
2017-10-24 20:51:35 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapper)
>>> open('dump.html', 'wb').write(response.body)
>>> response.xpath('//*[@id="anos_curr_div"]/div').extract()

:triangular_ruler: Database design


:page_with_curl: More information