codefordayton / scrapers

Various web scrapers used to collect open data.
The Unlicense
4 stars 6 forks source link

Eligible field from the Dayton REAP scraper is not pulling the correct field #4

Closed DavidEBest closed 9 years ago

DavidEBest commented 9 years ago

The Treasurer's site is not consistent in the placement of the Eligible field data.

The scraper is always pulling that first field, which results in data that contains the value 'Sold' or 'Eligible'. It should contain values 'Sold', 'Yes', and 'No'.

The logic that pulls that data needs to be made a bit more complex to pull the yes/no value if the state is not 'sold'.

R72 01803 0074 screenshot 2014-11-26 16 40 35

R72 09205 0004 screenshot 2014-11-26 16 40 47

R72 07108 0026 screenshot 2014-11-26 16 40 58

janmicohio commented 9 years ago

Well that should make the real "eligible" data set a lot smaller and more manageable!

Good job, Dave B! I wonder if the details on the "SOLD" are useful. I wonder what happens when the VALID date arrives?

DavidEBest commented 9 years ago

@janmicohio That cut the dataset down to ~11,000 records. It is still slower than I'd like, but more managable.

Regarding the 'Valid' date...I don't know, but I can check that property tomorrow. :)

janmicohio commented 9 years ago

11K records sounds much closer to what I'd guess the eligibility number to be. Keep up the good work!

cimmone commented 9 years ago

If there are any more fields you guys would like populated, just let us know! It's not hard to add new fields from the data, so I'm all for aggregating more data if you think it's useful.

On Sun, Nov 30, 2014 at 10:58 PM, Janet Michaelis notifications@github.com wrote:

11K records sounds much closer to what I'd guess the eligibility number to be. Keep up the good work!

— Reply to this email directly or view it on GitHub https://github.com/codefordayton/scrapers/issues/4#issuecomment-65018310 .