This pull request introduces a new workflow for scraping and processing calendar events from the Amherst College website. The changes include adding a new management command, implementing the scraping and saving logic, and creating tests for the new functionality. The most important changes are:
New Workflow Command:
Added a new management command calendar_workflow.py to scrape calendar events and save them to the database. This command integrates the scraping, saving to JSON, and processing of events. (access_amherst_backend/access_amherst_algo/management/commands/calendar_workflow.py)
Scraping and Processing Logic:
Implemented calendar_parser.py to handle the scraping of events from the Amherst College calendar, including fetching pages, scraping event details, and saving events to JSON files. (access_amherst_backend/access_amherst_algo/calendar_scraper/calendar_parser.py)
Implemented calendar_saver.py to load the scraped JSON files, check for similar events, and save new events to the database. (access_amherst_backend/access_amherst_algo/calendar_scraper/calendar_saver.py)
Testing:
Added unit tests for the scraping logic in test_calendar_parser.py, covering various scenarios such as successful fetch, handling errors, and saving to JSON. (access_amherst_backend/access_amherst_tests/test_calendar_parser.py)
Workflow Configuration:
Updated the GitHub Actions workflow to include the new calendar_workflow management command. (.github/workflows/tasks.yml)
This pull request introduces a new workflow for scraping and processing calendar events from the Amherst College website. The changes include adding a new management command, implementing the scraping and saving logic, and creating tests for the new functionality. The most important changes are:
New Workflow Command:
calendar_workflow.py
to scrape calendar events and save them to the database. This command integrates the scraping, saving to JSON, and processing of events. (access_amherst_backend/access_amherst_algo/management/commands/calendar_workflow.py
)Scraping and Processing Logic:
calendar_parser.py
to handle the scraping of events from the Amherst College calendar, including fetching pages, scraping event details, and saving events to JSON files. (access_amherst_backend/access_amherst_algo/calendar_scraper/calendar_parser.py
)calendar_saver.py
to load the scraped JSON files, check for similar events, and save new events to the database. (access_amherst_backend/access_amherst_algo/calendar_scraper/calendar_saver.py
)Testing:
test_calendar_parser.py
, covering various scenarios such as successful fetch, handling errors, and saving to JSON. (access_amherst_backend/access_amherst_tests/test_calendar_parser.py
)Workflow Configuration:
calendar_workflow
management command. (.github/workflows/tasks.yml
)