Closed Krajstofer closed 5 years ago
Yes, google_maps is kinda hard to scrape. I am working on the scraper but I am overloaded with work right now..
The google maps scraping is also vastly different from other search engines, because the process looks like this:
I checked parse_async
function in GoogleMapsScraper
class. I received html data, and that's ok. But I think that parse_async
and evaluate
functions don't have access to document
element.
EDIT:
OK, I see that it's important to have scrape_in_detail: true
in config, but then i have only first result from 20.
[i] [se-scraper] started at [Thu, 11 Jul 2019 11:40:11 GMT] and scrapes google with 1 keywords on 1 pages each.
[i] Using startUrl: https://www.google.com/maps
[i] google scrapes keyword "fryzjer" on page 1
[i] Sleeping for 1s
Profiles to visit: 20
[ 'Reymonta 5, 60-791 Poznań',
'CV2Q+RQ Poznań',
'723 915 777',
'Dodaj witrynę' ]
Error: Node is detached from document
at ElementHandle._scrollIntoViewIfNeeded (/path/scraper/node_modules/puppeteer/lib/JSHandle.js:185:13)
at process._tickCallback (internal/process/next_tick.js:68:7)
-- ASYNC --
at ElementHandle.<anonymous> (/path/scraper/node_modules/puppeteer/lib/helper.js:111:15)
at GoogleMapsScraper.visit_profile (/path/scraper/node_modules/se-scraper/src/modules/google.js:542:23)
at GoogleMapsScraper.parse_async (/path/scraper/node_modules/se-scraper/src/modules/google.js:523:54)
at process._tickCallback (internal/process/next_tick.js:68:7)
There are many issues right now with google maps...When I find time I will implement it properly.
Right now I am wondering if it is even better to start a new project for "location search/ small business search" because the logic is different.
The main problem is that scraping google maps takes two loops instead of one.
With normal search engines: Loop over all keywords and all pages and parse results. With google maps: Loop over all search results, click on each results, parse the profile page, go to next result.
We cannot parallelize this logic in se-scraper, therefore I am hesitant.
I was starting with basic configuration and I always had the same result for my keyword. My code looks like this:
And this is my results from terminal:
Do you have any idea what can I do to get
results
data?