NikolaiT / GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
https://scrapeulous.com/
Apache License 2.0
2.63k stars 736 forks source link

Use scraping result directly in python #178

Open so3500 opened 7 years ago

so3500 commented 7 years ago

Please kindly help me with my issue. i'm testing GoogleScraper/Examples/basic.py and successfully got results.

this is my config info.

config = {
    'use_own_ip': True,
    'keyword': 'how to make blabla',
    'search_engines': ['google'],
    'num_pages_for_keyword': 1,
    'scrape_method': 'selenium',
    'sel_browser': 'chrome',
    'do_caching': False,
}

and i checked results in shell.

figure 1

image

And I can only use the data in Python as shown below.

figure 2

image

I know that the results from figure 1 are stored in the database I also know that there is a way to add 'output_filename' field to config and save it to a file and then use it to call it.

config = {
    'use_own_ip': True,
    'keyword': 'how to make blabla',
    'search_engines': ['google'],
    'num_pages_for_keyword': 1,
    'scrape_method': 'selenium',
    'sel_browser': 'chrome',
    'do_caching': False,
    'output_filename': 'output.csv',
}

But,

I want to use the results from figure 1 directly in Python code (title, link etc)

Any Idea??

alon001 commented 7 years ago

Give me a few days, and I will show you where in the code you could greb it and print or what ever you need. I did it, but busy next days.

alon001 commented 7 years ago

add the below lines in database.py file, set_values_from_parser: print("PARSED LINK IS: ", link['link']) print("PARSED TITLE IS: ", link['title']) print("PARSED SNIPPET IS: ", link['snippet']) add it after the lines: Link( link=link['link'], snippet=link['snippet'], title=link['title'], visible_link=link['visible_link'], domain=parsed.netloc, rank=link['rank'], serp=self, link_type=key, ) You can add similarly to print to log: logger.info("PARSED LINK IS: ") logger.info(link['link']) etc. I don't remember, but if needed for the logger print, add at the top of database.py file the next import (if it is missing I think it is needed to enable logger print): import logging logger = logging.getLogger(name) I guess you can now add it to any specific log file you want, in the same way.

so3500 commented 7 years ago

I really appreciate your kind and quick reply. I will test the answer you have uploaded as soon as possible, upload the results, and close the issue.

Sinyii commented 6 years ago
search = sqlalchemy_session.query(ScraperSearch).all()[-1]
for serp in search.serps:
    for link in serp.links:
        print("KW:%s" %(serp.query))
        print(link.snippet)

You can change snippet to the element you want to use. :)