flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖
GNU Affero General Public License v3.0
847 stars 180 forks source link

"NoneType" in title_el.get("href") when scraping kleinanzeigen #515

Open alvarnydev opened 9 months ago

alvarnydev commented 9 months ago

Hi you lovely people!

I currently run into an issue when scraping kleinanzeigen because the bot seems to have trouble getting the link from the current listing it parses over, sometimes. It works for a while and eventually breaks. Looks like this for me:

Screenshot 2024-01-17 at 13 06 53

I looked through the existing and past issues and didn't find anything similar. Nevertheless, have you guys maybe seen this before? I run the Docker image of flathunter using docker compose on an Ubuntu 22 machine with the following config:

wghunter:
    build: .
    platform: linux/amd64
    command: python flathunt.py
    restart: always
    environment:
      - FLATHUNTER_TARGET_URLS=https://www.kleinanzeigen.de/s-frankfurt-am-main/wg/k0l4292;https://www.wg-gesucht.de/wg-zimmer-in-Frankfurt-am-Main.41.0.1.0.html?csrf_token=bf76f1e4c7392fd9aeadc109872a8fb14038151b&offer_filter=1&city_id=41&sort_order=0&noDeact=1&categories%5B%5D=0&rMax=1000&wgMxT=3&wgAge=28&wgSmo=2&exc=2&img_only=1
      - FLATHUNTER_DATABASE_LOCATION=./dbs/wgs/
      - FLATHUNTER_LOOP_PERIOD_SECONDS=120
      - FLATHUNTER_MESSAGE_FORMAT={title} \#CR# > Zimmer {rooms} \#CR# > Größe {size} \#CR# > Preis {price} \#CR# > Ort {address} \#CR# > Link {url}
      - FLATHUNTER_NOTIFIERS=telegram
      - FLATHUNTER_TELEGRAM_BOT_TOKEN=<...>
      - FLATHUNTER_TELEGRAM_RECEIVER_IDS=<...>
      - FLATHUNTER_HEADLESS_BROWSER=yes
    volumes:
      - ./:/usr/src/app
codders commented 9 months ago

Hi @alvarnydev,

I've not seen that before, no. It seems to pick out the title elements (at least on my crawls) without complaining. Looks like the search for the titel_el fails. How does the HTML look there?

alvarnydev commented 9 months ago

Hey thanks for the comment. I haven't really looked into it much further because the docker compose config just restarts and works fine from there, until it eventually crashes again, in perpetuum. When I have the time I'll look into it more

PlanetDyna commented 7 months ago

Unfourtunately got the same problem.

zahnech commented 3 weeks ago

same issue изображение

jukoson commented 3 weeks ago

Here's how I parse Kleinanzeigen. Maybe it helps in providing a fix:

expose_ids = soup.find_all("article", class_="aditem")
for x, expose in enumerate(expose_ids):
    title = expose.find(class_="ellipsis")