ecoron / SerpScrap

SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
https://github.com/ecoron/SerpScrap
MIT License
257 stars 61 forks source link

Scraper standard exmaple returns only Youtube results in Lithuanian search #61

Open MindaugasVaitkus2 opened 4 years ago

MindaugasVaitkus2 commented 4 years ago

I used standard example to scrap related keywords data and added configuration codes. Scrapper return and writes to csv file only search results for Youtube insertions. The same has been data written to serpscrap.db - only search results for videos.

#!/usr/bin/python3
# -*- coding: utf-8 -*-
import pprint
import serpscrap

def scrape_to_csv(config, keywords):
    scrap = serpscrap.SerpScrap()
    scrap.init(config=config.get(), keywords=keywords)
    return scrap.as_csv('/Users/User/Desktop/stellar.csv')

def get_related(config, keywords, related):
    scrap = serpscrap.SerpScrap()
    scrap.init(config=config.get(), keywords=keywords)
    scrap.run()
    results = scrap.get_related()
    for keyword in results:
        if keyword['keyword'] not in related:
            related.append(keyword['keyword'])
    return related

config = serpscrap.Config()
config_new = {
   'cachedir': '/Users/User/Desktop/.serpscrap/',
   'clean_cache_after': 24,
   'database_name': '/Users/User/Desktop/serpscrap',
   'do_caching': True,
   'num_pages_for_keyword': 3,
   'scrape_urls': True,
   'search_engines': ['google'],
   'google_search_url': 'https://www.google.lt/search?',
   'executable_path': '/Users/User/bin/chrome_driver/chromedriver.exe',
}

config.apply(config_new)
config.set('scrape_urls', False)

keywords = ['stellar']

related = keywords
related = get_related(config, keywords, related)

scrape_to_csv(config, related)

pprint.pprint('********************')
pprint.pprint(related)