BenWirus / ZombieVoters

Find dead people who are registered to vote.
MIT License
29 stars 7 forks source link

Results do not load after first 5000 entries #4

Closed Jambre closed 3 years ago

Jambre commented 3 years ago

Looks like Ancestry does not show more than 5000 entries, after that the pages are blank. If 20 per page, that ends on page 250. If 50 per page, that ends on page 100.

This is a huge roadblock on this project.

BenWirus commented 3 years ago

Agreed. I came across this last night. http://ssdmf.info/download.html it has all deaths for the country. I haven't worked out how to make sense of some of the fields in the data.

BenWirus commented 3 years ago

Using the SSDMF from that download would also tremendously speed up the process.

BenWirus commented 3 years ago

Another option is:

The minimum query for this is a zip code, so in theory scraping could start with the zipcode

BenWirus commented 3 years ago

This will be resolved with #6. The solution I've come up with is to do more specific searches against myheritage.com's graphql API. They also have a limit, but I'm able to work around this by requesting a deaths for a zip code that died in a specific year and were born in a specific year. Then loop through all of the death year birth year zip code combinations.

BenWirus commented 3 years ago

This is resolved as of #6. I've merged the fix into the main branch. The fix was to use myheritage.com's graphql API and specific searches that return less results.