Closed pcace closed 1 year ago
Hi @pcace,
having a crawler for Howoge would be pretty cool!
Getting the CSS selectors requires a bit of manual work. Copying the selector is usually a good starting point. In general, use as many classes or IDs as possible and if possible don't depend on the position of an element with its parent.
In the example you gave, getting the price of a listing could look like this:
#immoobject-list .content .row .attributes .attributes-content.color-secondary
I like using the browser console to experiment and figure out the best selector:
> document.querySelector("#immoobject-list .content .row .attributes .attributes-content.color-secondary").textContent
"
691,60 €
"
The scraper is based on x-ray. It uses |
to define filters: https://github.com/matthewmueller/x-ray#filters
removeNewline
and trim
are custom filters that are defined here: https://github.com/adriankumpf/findmeaflat/blob/master/lib/scraper.js#L5
When debugging, of course, it depends on where the problem is. Often a few console.log
statements help to find out why e.g. the price is not read out correctly.
I hope this helps!
Hi there, i am starting with java and tried to understand how the crawler works. so i tried to make a howoge crawler. but i am really not so sure how i would start it. maybe you could help me with that?
here is an example link:
https://www.howoge.de/wohnungen-gewerbe/wohnungssuche.html?tx_howsite_json_list%5Bpage%5D=1&tx_howsite_json_list%5Blimit%5D=12&tx_howsite_json_list%5Blang%5D=&tx_howsite_json_list%5Bkiez%5D%5B%5D=Marzahn&tx_howsite_json_list%5Bkiez%5D%5B%5D=99&tx_howsite_json_list%5Bkiez%5D%5B%5D=Buch&tx_howsite_json_list%5Bkiez%5D%5B%5D=Alt-Hohensch%C3%B6nhausen&tx_howsite_json_list%5Bkiez%5D%5B%5D=Neu-Hohensch%C3%B6nhausen&tx_howsite_json_list%5Bkiez%5D%5B%5D=Fennpfuhl&tx_howsite_json_list%5Bkiez%5D%5B%5D=Alt-Lichtenberg&tx_howsite_json_list%5Bkiez%5D%5B%5D=Friedrichsfelde&tx_howsite_json_list%5Bkiez%5D%5B%5D=Karlshorst&tx_howsite_json_list%5Bkiez%5D%5B%5D=Treptow-K%C3%B6penick&tx_howsite_json_list%5Bkiez%5D%5B%5D=Pankow&tx_howsite_json_list%5Brent%5D=900&tx_howsite_json_list%5Barea%5D=70&tx_howsite_json_list%5Brooms%5D=2&tx_howsite_json_list%5Bwbs%5D=all-offers
i then created a howoge.js in the sources folder with this howoge object:
i now dont really know how i would find out the correct selectors for the properties. my way was to try "copy selector" within chrome:
so for example i came up with this selector in chrome for 'price':
#immoobject-list > div:nth-child(4) > div > div.content > div.row > div:nth-child(1) > div > div:nth-child(1) > div.attributes-content.color-secondary
and converted it to what you have used in the other crawlers:
'.div:nth-child(1) div div.content div.row div:nth-child(1) div div:nth-child(1) div.attributes-content.color-secondary | removeNewline | trim',
so here is what i dont understand:
Thank you so much in advance for help, and sorry for asking so dumb questions - i am just starting to learn java...
Cheers