Closed pjsier closed 4 years ago
I can take this one
Sounds good!
is this still being worked on?
@JordanLozinski since there haven't been any updates in over a month it's all yours if you're interested
Sounds good. I'll start soon.
I could use some advice from somebody who's better at Scrapy than I am (this is my first time using it)
From chi_ssa_4.py
start_urls = [
"https://95thstreetba.org/events/category/board-meeting/?"
"tribe_paged=1&tribe_event_display=list&tribe-bar-date=2017-10-01"
]
def parse(self, response):
"""
`parse` should always `yield` Meeting items.
Change the `_parse_title`, `_parse_start`, etc methods to fit your scraping
needs.
"""
obj = response.css(".tribe-events-loop .tribe-events-category-board-meeting")
for item in obj:
# Only grab things of class tribe-events-category-board-meeting
start, end = self._parse_time(item)
meeting = Meeting(
# ... Removed
)
meeting["status"] = self._get_status(meeting)
meeting["id"] = self._get_id(meeting)
yield meeting
# Look for next page button
next_page = response.css(
"#tribe-events-footer .tribe-events-nav-pagination "
".tribe-events-sub-nav .tribe-events-nav-next a::attr(href)"
).extract_first()
if next_page:
yield scrapy.Request(next_page, callback=self.parse)
At the start page, there are only ten meetings on the page, once I finish the page I check for the "next events" button at the bottom and try to scrape it (yield scrapy.Request(next_page, callback=self.parse)
). I don't understand generators that well so I'm not sure if I'm doing this right, I based my approach off of chi_ssa_42.py
. When I run my test, I get an error saying that Request
objects aren't subscriptable (on lines where I try to subscript into parsed_items
which should all be Meeting
type) so it seems like parse is yielding the Request
itself instead of a Meeting
.
The problem might just be that only the start page is saved in the tests/files/
dir. I'll try to save a second page and see if that fixes it - just wanted to get some feedback on whether it's the python that's wrong or not
@JordanLozinski I left some comments on #941 related to this, but in general it looks like cook_landbank
could be a good template to follow here
Thank you. I'll start working on this
@pjsier Hi Patrick, I can work on this
@alexanderwoo sounds good! Assigning you now
@pjsier I see that this one doesn't have any documents, am I scraping for just the time and date? Or are there some documents that I cannot seem to find?
@alexanderwoo It looks like some in the past have, you can take a look at the last PR for this to see how they showed up in the past #941
@pjsier It looks like the code was left off at dynamically generating the URL. However, the URLs don't require the date as we're parsing with POST responses. The previous user suggested dynamically generating the start date for POST responses. I'm not too sure how to go about that, any suggestions? Or is there another way of parsing future posts you'd recommend?
Sorry, I've been a bit out of the loop on this one so let me know if I'm misunderstanding, but after looking at this again it seems like they have the WordPress JSON endpoint enabled here https://95thstreetba.org/wp-json/tribe/events/v1/. There's some documentation in that response, but in general this might be more straightforward to filter using the parameters documented there on this endpoint https://95thstreetba.org/wp-json/tribe/events/v1/events
Hi, can I claim this if its no longer being worked on?
@ledaliang sounds good! I'll assign you now
Closed by #964, thanks @ledaliang!
URL: https://95thstreetba.org/events/category/board-meeting/ Spider Name: chi_ssa_4 Agency Name: Chicago Special Service Area #4 South Western Avenue
See the contribution guide for information on how to get started