StephenOTT / OttawaSIREScrape

Scrape using FMiner to scrape content from the City of Ottawa's SIRE platform (Council Voting and Minutes)
1 stars 0 forks source link

Search Results Scrape duplicated each result 5 additional times. #31

Closed StephenOTT closed 11 years ago

StephenOTT commented 11 years ago

Search Results Scrape duplicated each result 5 additional times.

Scape issue.

Reviewing original scrape

StephenOTT commented 11 years ago

Pretty sure i found the issue. Started another scrape to see it if was the issue

StephenOTT commented 11 years ago

Problem looks to be that i was using recursion to page through the 6+ pages of search results. But my initial thoughts were that the pager was reloading the page. But looks like they are loading all search results into memory but the JS hides part of it.

The scrape accesses the entire set of search results.

Use of Recursion: Screen Shot 2013-02-26 at 11 16 11 PM

See without recursion the total search results were 86 rows, and the scrape was picking up all 86 rows even without paging through: Screen Shot 2013-02-26 at 11 16 17 PM

Screen Shot 2013-02-26 at 11 16 26 PM

StephenOTT commented 11 years ago

Scrape completed.

New scape with the above adjustments returned 86 results. 86 is the expected / correct number.

Closing

StephenOTT commented 11 years ago

SQL DB file ( Not modifed for Import yet): https://github.com/StephenOTT/OttawaSIREScrape/commit/c4f7f7f5fc7117669281da22e70d0c8ec87528d8

StephenOTT commented 11 years ago

Imported in MySQL Export to Excel: https://github.com/StephenOTT/OttawaSIREScrape/commit/c0c64fe675ed0b8e417ce3927de3914fa1bb877f

Then imported into MySQL and export CSV for import: https://github.com/StephenOTT/OttawaSIREScrape/commit/cf6dd23992fbbaa8a2696f129edbaa9433b0db6a