ecoron / SerpScrap

SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
https://github.com/ecoron/SerpScrap
MIT License
257 stars 61 forks source link

serpscrap.SerpScrap() returns None for some keywords #52

Open GefenPuravida opened 5 years ago

GefenPuravida commented 5 years ago

Hi. Does somebody have any idea what could be the reason that on some keywords i get the data while on others i don't ?

for example, dog food:

import serpscrap

keywords = ['dog food']

config = serpscrap.Config()
config.set('scrape_urls', True)

scrap = serpscrap.SerpScrap()
scrap.init(config=config.get(), keywords=keywords)
scrap.as_csv('/tmp/output')
2019-09-22 11:55:14,988 - root - INFO - 
                Going to scrape 2 keywords with 1
                proxies by using 1 threads.
2019-09-22 11:55:14,990 - scrapcore.scraping - INFO - 
        [+] SelScrape[localhost][search-type:normal][https://www.google.com/search?] using search engine "google".
        Num keywords=1, num pages for keyword=[1]

2019-09-22 11:55:24,286 - scrapcore.scraper.selenium - INFO - https://www.google.com/search?
2019-09-22 11:55:55,364 - scrapcore.scraping - INFO - 
            [google]SelScrape localhost - Keyword: "dog food" with [1, 2] pages,
            slept 22 seconds before scraping. 1/1 already scraped

2019-09-22 11:55:56,767 - scrapcore.scraper.selenium - INFO - Requesting the next page
2/2 keywords processed.
2019-09-22 11:56:01,961 - root - INFO - Scraping URL: https://www.mypetneedsthat.com/best-dry-dog-foods-guide/
2019-09-22 11:56:02,681 - root - INFO - Scraping URL: https://www.businessinsider.com/best-dog-food
2019-09-22 11:56:02,686 - root - INFO - Scraping URL: https://www.akc.org/expert-advice/nutrition/best-dog-food-choosing-whats-right-for-your-dog/
2019-09-22 11:56:02,689 - root - INFO - Scraping URL: https://www.amazon.com/Best-Sellers-Pet-Supplies-Dry-Dog-Food/zgbs/pet-supplies/2975360011
2019-09-22 11:56:02,690 - root - INFO - Scraping URL: https://www.chewy.com/b/food-332
2019-09-22 11:56:26,122 - root - INFO - Scraping URL: https://www.petco.com/shop/en/petcostore/category/dog/dog-food
2019-09-22 11:56:26,123 - root - INFO - Scraping URL: https://www.petflow.com/dog/food
2019-09-22 11:56:26,843 - root - INFO - Scraping URL: https://www.dogfoodadvisor.com/
2019-09-22 11:56:27,735 - root - INFO - Scraping URL: https://www.petsmart.com/dog/food/dry-food/
2019-09-22 11:56:27,737 - root - INFO - Scraping URL: https://www.petsmart.com/dog/food/
2019-09-22 11:56:27,738 - root - INFO - Scraping URL: https://www.purina.com/dogs/dog-food
2019-09-22 11:56:28,635 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=fBABfWqSN2I
2019-09-22 11:56:31,757 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=7P85BMCCboI
2019-09-22 11:56:36,807 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=az0ktsWYydw
2019-09-22 11:56:39,645 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=njJ99wPByy4
2019-09-22 11:56:42,571 - root - INFO - Scraping URL: https://nypost.com/video/homeless-man-and-his-dog-reuniting-is-pure-joy/
2019-09-22 11:56:45,156 - root - INFO - Scraping URL: /aclk?sa=l&ai=DChcSEwjRyYG5h-TkAhUM1WQKHSiFASYYABAAGgJwag&sig=AOD64_2IRYpCakgEzR3BK1oqeuLCVa3mjA&adurl=&rct=j&q=
2019-09-22 11:56:45,157 - root - INFO - Scraping URL: https://www.purina.com/dogs/dog-food
2019-09-22 11:56:45,867 - root - INFO - Scraping URL: https://en.wikipedia.org/wiki/Dog_food
2019-09-22 11:56:45,872 - root - INFO - Scraping URL: https://www.hillspet.com/dog-food
2019-09-22 11:56:45,876 - root - INFO - Scraping URL: https://www.smithsfoodanddrug.com/pl/dog-food/11103
2019-09-22 11:57:10,321 - root - INFO - Scraping URL: https://www.canidae.com/dog-food/
2019-09-22 11:57:10,325 - root - INFO - Scraping URL: https://www.petcarerx.com/dog/food-nutrition
2019-09-22 11:57:11,222 - root - INFO - Scraping URL: https://www.businessinsider.com/best-dog-food
2019-09-22 11:57:11,223 - root - INFO - Scraping URL: https://www.tractorsupply.com/tsc/catalog/dog-food
2019-09-22 11:57:12,249 - root - INFO - Scraping URL: https://www.thehonestkitchen.com/dog-food
2019-09-22 11:57:12,253 - root - INFO - Scraping URL: https://www.boxed.com/products/category/418/dog-food
2019-09-22 11:57:13,171 - root - INFO - Scraping URL: https://lifesabundance.com/category/dogfood.aspx
2019-09-22 11:57:13,174 - root - INFO - Scraping URL: //www.googleadservices.com/pagead/aclk?sa=L&ai=DChcSEwj5_NHFh-TkAhWTr-wKHSgSDVMYABAAGgJwag&ohost=www.google.com&cid=CAASEuRoai4G0R8MNbToVnZKzozmNA&sig=AOD64_10tA_ESFCwAHTPgPUTDsInBgYwEQ&adurl=&rct=j&q=
2019-09-22 11:57:13,178 - root - INFO - Scraping URL: https://freshpet.com/why-freshpet/
2019-09-22 11:57:13,901 - root - INFO - Scraping URL: https://pet-food.thecomparizone.com/?var1=82002114870&var2=381760664839&var4&var5=b&var7=1234567890&utm_source=google&utm_medium=cpc
None
Traceback (most recent call last):
  File "C:\Users\rot\Anaconda3\lib\site-packages\serpscrap\csv_writer.py", line 14, in write
    w.writerow(row)
  File "C:\Users\rot\Anaconda3\lib\csv.py", line 155, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
  File "C:\Users\rot\Anaconda3\lib\csv.py", line 151, in _dict_to_list
    + ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'url', 'encoding', 'meta_robots', 'meta_title', 'text_raw', 'last_modified', 'status'
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\Anaconda3\lib\site-packages\serpscrap\csv_writer.py in write(self, file_name, my_dict)
     13                 for row in my_dict[0:]:
---> 14                     w.writerow(row)
     15         except Exception:

~\Anaconda3\lib\csv.py in writerow(self, rowdict)
    154     def writerow(self, rowdict):
--> 155         return self.writer.writerow(self._dict_to_list(rowdict))
    156 

~\Anaconda3\lib\csv.py in _dict_to_list(self, rowdict)
    150                 raise ValueError("dict contains fields not in fieldnames: "
--> 151                                  + ", ".join([repr(x) for x in wrong_fields]))
    152         return (rowdict.get(key, self.restval) for key in self.fieldnames)

ValueError: dict contains fields not in fieldnames: 'url', 'encoding', 'meta_robots', 'meta_title', 'text_raw', 'last_modified', 'status'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
<ipython-input-16-3f66e8511348> in <module>
      8 scrap = serpscrap.SerpScrap()
      9 scrap.init(config=config.get(), keywords=keywords)
---> 10 scrap.as_csv('/tmp/output')

~\Anaconda3\lib\site-packages\serpscrap\serpscrap.py in as_csv(self, file_path)
    146         writer = CsvWriter()
    147         self.results = self.run()
--> 148         writer.write(file_path + '.csv', self.results)
    149 
    150     def scrap_serps(self):

~\Anaconda3\lib\site-packages\serpscrap\csv_writer.py in write(self, file_name, my_dict)
     15         except Exception:
     16             print(traceback.print_exc())
---> 17             raise Exception

Exception: 

Many thanks !!