City-Bureau / city-scrapers

Scrape, standardize and share public meetings from local government websites
https://cityscrapers.org
MIT License
332 stars 310 forks source link

#562 Chicago Special Service Area #23 #949

Closed zuzaman closed 4 years ago

zuzaman commented 4 years ago

Summary

Issue: #562

Checklist

All checks are run in GitHub Actions. You'll be able to see the results of the checks at the bottom of the pull request page after it's been opened, and you can click on any of the specific checks listed to see the output of each step and debug failures.

Questions

First pull request and first time working with scrapy. I am interested to hear your opinion. First started with CSS selectors, the realized how much XPATH makes things easier.

I tried to keep the checks regarding the document links from getting too "nested". Currently there are no documents uploaded for the year 2020 it. I find it hard to guess with what HTML formatting will this be done (if they will leave the ordered list and add the href for a list element or if an extra H4 will be made for it still during the same year), so I just assumed that only prior years have document links.

What do you think?

zuzaman commented 4 years ago

It seems like since making the changes with the tests, the ones on github seems to fail even though with the same file they pass on my PC. I am a bit puzzled currently why this could be the case.

zuzaman commented 4 years ago

Hmm even after making sure I only have the raw HTML file of the website, it seems that the automated tests return, possibly, the results in a different order than the pytest on my own PC. Do you have any idea why this might happen?

pjsier commented 4 years ago

@zuzaman haven't looked at the rest of the PR yet, but my guess is it's because Python 3.5 has different default sorting behavior than the other versions we're testing. If you sort the items explicitly like we do here https://github.com/City-Bureau/city-scrapers/blob/15e67c6a249d55db320c8eaa40198cf887e208c8/tests/test_cook_pension.py#L20 and then adjust the expected results it should work

zuzaman commented 4 years ago

@pjsier thanks that did the trick. There seems to be an issue with a connection timeout in test 3.7. Even though I can open the same URL without issue at the same time. Edit: it just took more time. at the end this check passed as well