asmaier / ImmoSpider

Immospider is a crawler for the Immoscout24 website.
187 stars 49 forks source link

Issue? Copying url from Immoscout doesn't work #6

Closed DefJM closed 4 years ago

DefJM commented 4 years ago

(Hope it is fine if I post this as an issue. It is very likely I just don't understand the script and it's just a question.)

Hi @asmaier,

Your ImmoSpider script looks super useful and I am trying to get the basic scraping working. In your readme you provide a basic example url looking like this:

url=https://www.immobilienscout24.de/Suche/S-T/Wohnung-Miete/Berlin/Berlin/-/2,50-/60,00-/EURO--1000,00

I can run the script with that url and I manage to generate the csv output. But once I specify my own search through the immoscout web-interface, the resulting url looks very different to the one you provide:

url=https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Hamburg;;;1276006001005;;Altona-Altstadt&numberofrooms=3.0-&price=-800000.0&livingspace=50.0-&geocoordinates=53.54883;9.9477;2.0&enteredFrom=one_step_search

It obviously doesn't work to plug such url into the scrapy crawl immoscout CLI command. I tried to reconstruct an url similar to yours but that failed as well.

Could you explain how or where you get that url from so that it is compatible with your script?

Thanks a lot, mo


Follow up: I noticed I can specify the URL within the spider immoscout.py file, constructing an url using & to link search filters such as &numberofrooms=4.0- .

url = "https://www.immobilienscout24.de/Suche/de/hamburg/hamburg/eppendorf/wohnung-kaufen?numberofrooms=4.0-&sorting=2&pagenumber=0"

But it doesn't work using the CLI. So I am good, I can use the script. But would still be great to get the CLI command running. Am I missing something?

Thanks, Jan

asmaier commented 4 years ago

You say scrapy crawl immoscout CLI command doesn't work with your url. But what is the error message, if there is any?

Just a guess: Maybe the reason are the semicolons (;) in the URL. You can try to encode each of the ; as %3B (see https://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters) and see if your URL works.

asmaier commented 4 years ago

Now I found time to debug the issue. The solution turned out to be much simpler: Simply enclose your url in quotes like

scrapy crawl immoscout -o apartments.csv -a url="https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Hamburg;;;1276006001005;;Altona-Altstadt&numberofrooms=3.0-&price=-800000.0&livingspace=50.0-&geocoordinates=53.54883;9.9477;2.0&enteredFrom=one_step_search"  -L INFO
DefJM commented 4 years ago

Thanks, this seems to work. Wow, why didn't I get this myself...