JasperMC / MarktplaatsScraper

Scrapes Marktplaats based on a search query and notifies the user.
9 stars 1 forks source link

docker issue #3

Open TommieNL opened 3 years ago

TommieNL commented 3 years ago

hey JasperMC,

Used your scraper for a while (until the pushover trial expired). I'm now getting back into it.

I've tried docker-compose up from this repo. But i'm getting some errors:

docker-compose up
Recreating marktplaatsscraper_marktplaatsscraper_1 ... done
Attaching to marktplaatsscraper_marktplaatsscraper_1
marktplaatsscraper_1  | Unknown option: -
marktplaatsscraper_1  | usage: python [option] ... [-c cmd | -m mod | file | -] [arg] ...
marktplaatsscraper_1  | Try `python -h' for more information.

docker-compose.yaml

version: "3.3"
services:
  marktplaatsscraper:
    build: .
    volumes:
      - ./Listings:/srv/Listings
      - ./Queries:/srv/Queries
    environment:
      PUSHOVER_API_TOKEN: 
      PUSHOVER_USER: 
      SCANNING_INTERVAL: 400
      WEBDRIVER_PATH: /usr/bin/chromedriver
      PUID: 1000
      PGID: 100
    restart: unless-stopped

Git shows that i'm up-to-date. Am i'm missing someting?

TommieNL commented 3 years ago

Hey @JasperMC good to know that you're still around :D Can you maybe also take a small peek on this one?

JasperMC commented 3 years ago

Sure! Looking at it real quick I feel it might be the Dockerfile. Let me have a look..

TommieNL commented 3 years ago

Cool! Thanks! Can I donate you a coffee or beer somewhere?

JasperMC commented 3 years ago

Hi @TommieNL,

By the looks of it it's because I haven't merged the docker-Marktplaats image with this repository. I'll take some time to fix this asap. The docker-MarktplaatsScraper image is working better for me, but gives me a 'FileWatcher not found' error when it actually runs the program.

I will first check that so you can have a working alternative. I think reorganizing this repository will take a bit longer than quickly fixing the other one.

Sorry for the inconvenience. Been caught up with a new job and my attention wasn't on here for a while.

Regarding the coffee - That would be awesome! I'd have to make some kind of donation link/platform :)

TommieNL commented 3 years ago

Ah yeah, I noted that FileWatcher not found error also. Seems that FileWatcher is something from the gdesklets tools what come's with Ubuntu? (If my Google skills are correct ;) )

The only pip3 FileWatcher package i could find is this one: https://pypi.org/project/filewatcher/ But that seems to be another tool(?)

Cool that you want to take the time to look into this. And let me know if you have a donation link!

Ps. best of luck at your new job. Would be a challenge during this remote-working-time!

JasperMC commented 3 years ago

Definitely! Weird time to start a new job..

FileWatcher was actually a .py file I had created to watch the Queries directory for new query files. That way, you can drag and drop them into a share and the program will pick it up in the next scan.

In an effort to clean up the repo I accidentally deleted it. Just restored my local backup and I can now successfully pull and run docker-MarktplaatsScraper. Can you check if it works for you as well?

The docker-MarktplaatsScraper image works differently than the image in this repository. It has S6 overlay, which means it has different steps for installing, fixing folder permissions, and running the service. In my opinion it's nice to split those things up instead of putting them in one dockerfile. I'll merge them soon :)

TommieNL commented 3 years ago

hmm

i'm now getting (tried with and without a query.json file):

Initial file scan found 0 queries.
Traceback (most recent call last):
  File "/config/Main.py", line 81, in <module>
    main(sys.argv[1:])
  File "/config/Main.py", line 28, in main
    scraper = Scraper(CONFIG['webdriverpath'])
  File "/config/Scraper.py", line 13, in __init__
    self.driver = webdriver.Chrome(webdriverpath, chrome_options=options)
  File "/usr/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
    self.service.start()
  File "/usr/lib/python3.9/site-packages/selenium/webdriver/common/service.py", line 98, in start
    self.assert_process_still_running()
  File "/usr/lib/python3.9/site-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
    raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: Service /usr/bin/chromedriver unexpectedly exited. Status code was: 127
JasperMC commented 3 years ago

Hmm strange. It looks like chrome webdriver crashed. I just pulled the image again on my homelab and I don't seem to get this error. You are pulling docker-Marktplaatsscraper, correct?

I also noticed on my setup it's still preferring the config.json over the command line arguments, I might have to fix that too.

I guess these are the "kinderziektes" of hacking this app together and just making it public real quick :)

I'm looking at all these different donation options and man there's a lot to choose from. If you'd like, you could use my PayPal which is jaspercardol@hotmail.com.

TommieNL commented 3 years ago

hehe "kinderziektes"

yeah i've pulled https://github.com/JasperMC/docker-MarktplaatsScraper.git

docker-compose.yaml:

version: "3.3"
services:
  marktplaatsscraper:
    build: .
    volumes:
      - /root/mpscraper/config:/config
    environment:
      PUSHOVER_API_TOKEN: xxxxxxxxxx
      PUSHOVER_USER: xxxxxxxxxxxxx
      SCANNING_INTERVAL: 400
      WEBDRIVER_PATH: /usr/bin/chromedriver
      PUID: 1000
      PGID: 100
    restart: unless-stopped

Removed all and pulled again + did a docker system prune

JasperMC commented 3 years ago

hehe "kinderziektes"

yeah i've pulled https://github.com/JasperMC/docker-MarktplaatsScraper.git

docker-compose.yaml:

version: "3.3"
services:
  marktplaatsscraper:
    build: .
    volumes:
      - /root/mpscraper/config:/config
    environment:
      PUSHOVER_API_TOKEN: xxxxxxxxxx
      PUSHOVER_USER: xxxxxxxxxxxxx
      SCANNING_INTERVAL: 400
      WEBDRIVER_PATH: /usr/bin/chromedriver
      PUID: 1000
      PGID: 100
    restart: unless-stopped

Removed all and pulled again + did a docker system prune

Does the chrome webdriver still crash? I actually programmed in some options to make it more compatible with a headless system when I originally made the program, but maybe that isn't enough. Strange that the same image produces different results..

I just checked my instance and it's been running fine without crashes for the past 30 minutes.

TommieNL commented 3 years ago

I'll try another time from scratch with new mounts etc. I'll get back on that soon!

JasperMC commented 3 years ago

I'll try another time from scratch with new mounts etc. I'll get back on that soon!

Let me know how it goes! Thank you for the beer. The description really cracked me up!