City-Bureau / city-scrapers

Scrape, standardize and share public meetings from local government websites
https://cityscrapers.org
MIT License
332 stars 310 forks source link

0718 spider chicago state university #974

Closed Rhysoshea closed 3 years ago

Rhysoshea commented 3 years ago

Summary

Issue: #718

Replace "ISSUE_NUMBER" with the number of your issue so that GitHub will link this pull request with the issue and make review easier.

Checklist

All checks are run in GitHub Actions. You'll be able to see the results of the checks at the bottom of the pull request page after it's been opened, and you can click on any of the specific checks listed to see the output of each step and debug failures.

Questions

Include any questions you have about what you're working on.

Rhysoshea commented 3 years ago

@pjsier I'm not sure why the imports are failing on the pull request tests, could you help me with this please?

alderocas commented 3 years ago

Hi @Rhysoshea! do hope you and @pjsier don't mind me butting in (sorry!).

It looks like your imports are incorrectly formatted. Check the output of pipenv run isort . --diff to see which.

PS don't forget to pipenv run flake8 and black to help with linting and formatting!

pjsier commented 3 years ago

@alderocas I don't mind at all, thanks!

Rhysoshea commented 3 years ago

Thanks a lot for your advice @alderocas, those were some important steps I missed out, and definitely useful tools to be aware of. All checks have passed now, so any changes you need making please let me know @pjsier

Rhysoshea commented 3 years ago

@pjsier thanks a lot for your helpful comments, I've made the changes you've requested.

I'm unsure if the way I've included the minutes in links is suitable so please let me know if changes need to be made. The trouble I was having with _parse_start ended up being with a.m. and p.m., so removing the . from each worked and hopefully all cases are covered now. There's still some possibly odd looking code in there with trying to remove unnecessary words - I did it this way because there are some instances of words being present in the middle of date strings i.e. August, 24 2020 (Board Retreat) @ 9:00 a.m., so that is the reasoning. Again please let me know if you'd prefer an alternate approach.

Rhysoshea commented 3 years ago

@pjsier sorry I didn't realise that! It was because of this

    start_urls = [
        f"https://www.csu.edu/boardoftrustees/\
        meetingagendas/year{date.today().year}.htm"
    ]

The linter complains that the line is too long so I included the \ to break up the line but I've just found someone's approach to include # noqa on the required line. Does this stop the linter from raising an error?

pjsier commented 3 years ago

@Rhysoshea no problem! Yeah you can add # noqa in cases where it makes sense to ignore the linting errors, and we typically use it for long URLs