City-Bureau / city-scrapers

Scrape, standardize and share public meetings from local government websites
https://cityscrapers.org
MIT License
332 stars 310 forks source link

Spider: Chicago Special Service Area #4 95th Street #905

Closed pjsier closed 4 years ago

pjsier commented 4 years ago

URL: https://95thstreetba.org/events/category/board-meeting/ Spider Name: chi_ssa_4 Agency Name: Chicago Special Service Area #4 South Western Avenue

See the contribution guide for information on how to get started

julytn commented 4 years ago

I can take this one

pjsier commented 4 years ago

Sounds good!

jordan-zilch commented 4 years ago

is this still being worked on?

pjsier commented 4 years ago

@JordanLozinski since there haven't been any updates in over a month it's all yours if you're interested

jordan-zilch commented 4 years ago

Sounds good. I'll start soon.

jordan-zilch commented 4 years ago

I could use some advice from somebody who's better at Scrapy than I am (this is my first time using it)

From chi_ssa_4.py

start_urls = [
    "https://95thstreetba.org/events/category/board-meeting/?"
    "tribe_paged=1&tribe_event_display=list&tribe-bar-date=2017-10-01"
]

def parse(self, response):
        """
        `parse` should always `yield` Meeting items.
        Change the `_parse_title`, `_parse_start`, etc methods to fit your scraping
        needs.
        """
        obj = response.css(".tribe-events-loop .tribe-events-category-board-meeting")
        for item in obj:
            # Only grab things of class tribe-events-category-board-meeting
            start, end = self._parse_time(item)
            meeting = Meeting(
              # ... Removed 
            )

            meeting["status"] = self._get_status(meeting)
            meeting["id"] = self._get_id(meeting)

            yield meeting

        # Look for next page button
        next_page = response.css(
            "#tribe-events-footer .tribe-events-nav-pagination "
            ".tribe-events-sub-nav .tribe-events-nav-next a::attr(href)"
        ).extract_first()
        if next_page:
            yield scrapy.Request(next_page, callback=self.parse)

At the start page, there are only ten meetings on the page, once I finish the page I check for the "next events" button at the bottom and try to scrape it (yield scrapy.Request(next_page, callback=self.parse)). I don't understand generators that well so I'm not sure if I'm doing this right, I based my approach off of chi_ssa_42.py. When I run my test, I get an error saying that Request objects aren't subscriptable (on lines where I try to subscript into parsed_items which should all be Meeting type) so it seems like parse is yielding the Request itself instead of a Meeting.

The problem might just be that only the start page is saved in the tests/files/ dir. I'll try to save a second page and see if that fixes it - just wanted to get some feedback on whether it's the python that's wrong or not

pjsier commented 4 years ago

@JordanLozinski I left some comments on #941 related to this, but in general it looks like cook_landbank could be a good template to follow here

jordan-zilch commented 4 years ago

Thank you. I'll start working on this

alexanderwoo commented 4 years ago

@pjsier Hi Patrick, I can work on this

pjsier commented 4 years ago

@alexanderwoo sounds good! Assigning you now

alexanderwoo commented 4 years ago

@pjsier I see that this one doesn't have any documents, am I scraping for just the time and date? Or are there some documents that I cannot seem to find?

pjsier commented 4 years ago

@alexanderwoo It looks like some in the past have, you can take a look at the last PR for this to see how they showed up in the past #941

alexanderwoo commented 4 years ago

@pjsier It looks like the code was left off at dynamically generating the URL. However, the URLs don't require the date as we're parsing with POST responses. The previous user suggested dynamically generating the start date for POST responses. I'm not too sure how to go about that, any suggestions? Or is there another way of parsing future posts you'd recommend?

pjsier commented 4 years ago

Sorry, I've been a bit out of the loop on this one so let me know if I'm misunderstanding, but after looking at this again it seems like they have the WordPress JSON endpoint enabled here https://95thstreetba.org/wp-json/tribe/events/v1/. There's some documentation in that response, but in general this might be more straightforward to filter using the parameters documented there on this endpoint https://95thstreetba.org/wp-json/tribe/events/v1/events

ledaliang commented 4 years ago

Hi, can I claim this if its no longer being worked on?

pjsier commented 4 years ago

@ledaliang sounds good! I'll assign you now

pjsier commented 4 years ago

Closed by #964, thanks @ledaliang!