Open pixml27 opened 4 years ago
Unfortunately I haven't tested with large amount of posts. Chrome does have a lot of memory issues. If anyone has any solution feel free to comment.
heres some options including how to run it headless which should help from selenium.webdriver.chrome.options import Options chrome_options = Options()
chrome_options.add_argument("--headless") chrome_options.headless = True # also works driver = webdriver.Chrome(options=chrome_options)
running it headless looks like a potential solution.
You should have
chrome_options.add_argument("--disable-gpu")
if you are running on windows
source: https://stackoverflow.com/questions/53657215/running-selenium-with-headless-chrome-webdriver
Thanks for advices, but i tried it all in different combinations Here are all options i founded closed to this problem, but it doesn't work
option.add_argument('--no-sandbox')
option.add_argument("--enable-automation")
#option.add_argument("start-maximized")
option.add_argument("--disable-extensions")
option.headless = True
option.add_argument('--disable-dev-shm-usage')
# Pass the argument 1 to allow and 2 to block
option.add_experimental_option("prefs", {
"profile.default_content_setting_values.notifications": 1,
"profile.managed_default_content_settings.images": 2, 'disk-cache-size': 16000
})
option.add_experimental_option("excludeSwitches", ["enable-automation"])
option.add_experimental_option('useAutomationExtension', False)
P.S. I'm runnig on linux P.P.S. I was trying to emulate scrolling down the page in chrome using the RPA-platform (basically just pressing the down key, but without human intervention) And Facebook stopped sending new posts(or updating page) somewhere after 300 scrolls (or chrome stopped loading them) But the chrome did not fall So there may be a problem in Facebook protection, but then why does chrome fall in scraper....
First of all, thanks for this scrapper! My problem is that when I download a large number of posts (> 4000) with 5-10 comments for each post, chrome just crashes.
Initially, I got an error when opening uncollapsed comments (invalid session ID) Then I changed the code, set to open comments at the time of the scroll function, and the error began to appear there (invalid session ID again)
I read a lot of threads on the stackoverflow, they recommend adding some options to chrome, I tried it all. Also, many places offer to add memory to chrome (if using docker), but I just run the script It also seems to me that this problem is somehow related to memory, chrome closes due to too many images, media, etc. Can you help me somehow? Have you had this and have you tested the script on large amounts of information?