0046 spider san joaquin river conservancy

Summary

Issue: #46

Hello, here is my Pull Request that has my Spider and corresponding tests for the San Joaquin River Conservancy. This site was a little difficult to parse since all of the meeting links and labels were siblings and had inconsistent formatting. The solution I came up with was to parse through the elements sequentially into a dictionary, where the keys are meeting titles and the values are their corresponding links.

On the website, the meeting times are stated to be 10:00 AM between March and October, and 10:30 AM otherwise. I found this to be mostly consistent for the 2021 Meetings, but the 2022 meetings seem to deviate more often. The only way I see to get more accurate meeting times would be to parse the Agenda pdfs.

This is my first pull request on this project, so please let me know if there are any revisions I should make or if there are any suggestions I should consider.

Checklist

All checks are run in GitHub Actions. You'll be able to see the results of the checks at the bottom of the pull request page after it's been opened, and you can click on any of the specific checks listed to see the output of each step and debug failures.

[ ] Tests are implemented
[ ] All tests are passing
[ ] Style checks run (see documentation for more details)
[ ] Style checks are passing
[ ] Code comments from template removed

Questions

I didn't see any packages like PyPDF2 in the Pipfile, but I was wondering: is that something we might add in the future to support parsing information from PDFs?

Also, should our spiders focus on parsing information from the most current meetings? Or should they be able to parse any past information that might be available on the website?

City-Bureau / city-scrapers-fresno