Datenschule / jedeschule-scraper

MIT License
22 stars 15 forks source link

Fix test changes workflow #94

Closed cyroxx closed 3 years ago

cyroxx commented 3 years ago

Follow-up to PR #61.

The previous PR had some issues:

  1. git whatchanged failed because actions/checkout@v2 only checks out the latest commit by default. Fixed by setting fetch-depth to 0 in order to check out the full history. (In the unforeseeable future, there might be a fetch-refs option to optimize this, see actions/checkout#155)
  2. the scrapy version was too old and did not have the --overwrite-output option that is used in test_changes.sh
  3. database_pipeline.py created a session during load time, which complicates things when we do not actually need the DatabasePipeline. refactored session creation into the get_session() method to be more flexible here. @k-nut : Please take a closer look at the session management (67911825907c47c76bcd17685d7a9bc6af426508). I do have the feeling that our session handling could be improved (i, e. using a fewer number of opened sessions overall).
  4. improved test_changes.py(!) so that its output is helpful in more cases
cyroxx commented 3 years ago

exemplary output can be seen here: https://github.com/cyroxx/jedeschule-scraper/runs/1739162888?check_suite_focus=true#step:6:1