Add wayne_land_bank spider

City-Bureau / city-scrapers

Scrape, standardize and share public meetings from local government websites

https://cityscrapers.org

MIT License

334 stars 311 forks source link

Add wayne_land_bank spider #885

Closed radoslawkrolikowski closed 5 years ago

radoslawkrolikowski commented 5 years ago

This pull request adds the wayne_land_bank spider, the test file of that spider and target web page as an HTML file.

To access the Board of Directors tab on the following page: https://public-wclb.epropertyplus.com/landmgmtpub/app/base/customPage we have to start with POST request.
The desired data was included in <script> tags, thus to extract the data, the re regular expressions were used.
The target site seems to be under the development, in the future the spider might require update if the website layer was changed or new information were added.

radoslawkrolikowski commented 5 years ago

All changes that you suggested have been made. I am only wondering about the location item. I used default location name and address because it looks clearer than this one on the website. After adding _validate_location method, maybe it will be good to use scraped location if default wasn't found.

loc = re.findall('The Board of Directors holds meetings at (.*?)(?=\\n)', self.response_data[0])[0]
print('Name: {}, Address: {}'.format(loc.split('at')[0].title(), loc.split('at')[1]))

Or raising an error as it is right now is better option?

pjsier commented 5 years ago

@radoslawkrolikowski thanks for these changes! For now, since the markup is so irregular I think it's safest to throw the error if the location isn't found. Reviewing now but I think these are good to go