Closed radoslawkrolikowski closed 5 years ago
All changes that you suggested have been made. I am only wondering about the location item. I used default location name and address because it looks clearer than this one on the website. After adding _validate_location
method, maybe it will be good to use scraped location if default wasn't found.
loc = re.findall('The Board of Directors holds meetings at (.*?)(?=\\n)', self.response_data[0])[0]
print('Name: {}, Address: {}'.format(loc.split('at')[0].title(), loc.split('at')[1]))
Or raising an error as it is right now is better option?
@radoslawkrolikowski thanks for these changes! For now, since the markup is so irregular I think it's safest to throw the error if the location isn't found. Reviewing now but I think these are good to go
This pull request adds the wayne_land_bank spider, the test file of that spider and target web page as an HTML file.
To access the Board of Directors tab on the following page: https://public-wclb.epropertyplus.com/landmgmtpub/app/base/customPage we have to start with POST request.
The desired data was included in
<script>
tags, thus to extract the data, there
regular expressions were used.The target site seems to be under the development, in the future the spider might require update if the website layer was changed or new information were added.