NIAEFEUP / uporto-schedule-scrapper

Python solution to extract the courses schedules from the different faculties of UPorto. Used to feed our timetable selection platform for students, TTS.
GNU General Public License v3.0
4 stars 2 forks source link

University of Porto Scrapper - UPS!

Python solution to extract the courses schedules from the different Faculties of University of Porto.

Requirements

If you don't have docker, you can use python locally in your machine or create a virtual environment. In this case make sure python version is >=3.8.

Local python

pip install -r ./src/requirements.txt   # Install dependencies

Virtual environment

python -m venv venv_scrapper            # Create virtual environment
source ./venv_scrapper/bin/activate        # Activate virtual environment
pip install -r ./src/requirements.txt   # Install dependencies

Quick start

:wrench: Configure

  1. Create a .env example from the .env.example file
cd src && cp .env.example .env
  1. Change the following fields in the .env file:
TTS_SCRAPY_YEAR=2023
TTS_SCRAPY_USER=username
TTS_SCRAPY_PASSWORD=password

:dash: Run

docker-compose run scrapper make
# or 
cd ./src && make
docker-compose run scrapper make dump
# or 
cd ./src && make dump
docker-compose run scrapper make upload
# or 
cd ./src && make upload
docker-compose run scrapper make clean
# or
cd ./src && make clean

:mag: Inspect

To inspect the scrapy engine, use scrapy shell "url"

Example:

root@00723f950c71:/scrapper# scrapy shell "https://sigarra.up.pt/fcnaup/pt/cur_geral.cur_planos_estudos_view?pv_plano_id=2523&pv_ano_lectivo=2017&pv_tipo_cur_sigla=D&pv_origem=CUR"
2017-10-24 20:51:35 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapper)
...
>>> open('dump.html', 'wb').write(response.body)
63480
>>> response.xpath('//*[@id="anos_curr_div"]/div').extract()

:triangular_ruler: Database design

Image

:page_with_curl: More information