duccioa / python-seloger

A simple Seloger.com wrapper
MIT No Attribution
6 stars 1 forks source link

research executed but parsing of the result page crashing #1

Open juliensciascia opened 6 years ago

juliensciascia commented 6 years ago

hello and thank you for this library , i did try to run the example code but it unfortunately crashes giving me the following error: seems the research call is working but the parsing of the page isnt:

Valid response from http://www.seloger.com/list.htm?idtt=2&cp=75015&idtypebien=1&pxmax=500000&surfacemin=40&tri=d_dt_crea&nb_balconsmin=1 The search returned 43 results. 40 results in 2 pages will be processed. Page 1 parsed Traceback (most recent call last): File "C:/Users/julien/PycharmProjects/deployMe/Realestate.py", line 13, in for result in results: File "c:\users\julien\pycharmprojects\deployme\venv\src\egg-name\SeLoger__init__.py", line 208, in get_results params = self.get_current_parameters(False, page) File "c:\users\julien\pycharmprojects\deployme\venv\src\egg-name\SeLoger__init__.py", line 131, in get_current_parameters json_str = re.search('({.});ava.', page_data_str_minified).group(1) AttributeError: 'NoneType' object has no attribute 'group'

Process finished with exit code 1

to get to this error i did run the example code you propose in your readme

from SeLoger import SeLogerAchat from SeLoger import show_search_filters

Check the syntax for the relevant search filters

show_search_filters()

search_criteria = {'cp': '75015', 'idtypebien': '1', 'pxmax': '500000', 'surfacemin': '40','tri': 'd_dt_crea', 'nb_balconsmin': '1'}

rent_paris = SeLogerAchat(search_criteria)

get_results creates a generator that can be iterated and stored in a list

results = rent_paris.get_results(2, print_results=1) ads = [] for result in results: ads.append(result)

Thank you

SogetiDataLab commented 6 years ago

Hello,

Same problem...

Did you solve it ?

Thanks.

Regards,

duccioa commented 6 years ago

Hello, Sorry for coming back so late, I didn't realised there was an open issue.

json_str = re.search('({.});ava.', page_data_str_minified).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

It seems that it doesn't find some information. It might be that they changed the design of the page. I wrote this last September and sometimes small changes in the HTML breaks the scrapers.

Unfortunately, I don't have time to maintain this library and I am not planning to do it in the near future. If you are planning to debug it, please feel free to contribute, I'd be grateful. We would have to go through the HTML code and check what is the field that causes the break. First thing, you could try to comment out the broken line, to see if that's the only thing that doesn't work.

duccioa commented 6 years ago

By the way, what is the line that causes the error? Did you try to run just

search_criteria = {'cp': '75015', 'idtypebien': '1', 'pxmax': '500000', 'surfacemin': '40','tri': 'd_dt_crea', 'nb_balconsmin': '1'}
rent_paris = SeLogerAchat(search_criteria)

Without the show_search_filters()?

SogetiDataLab commented 6 years ago

Bonjour,

Merci pour votre réponse rapide ! En français, ce sera aussi bien 😉

Ca plante ici : json_str = re.search('({.});ava.', page_data_str_minified).group(1)

Effectivement, ça doit être à cause de la balise html qui a dû changer, il ne ramène rien dans le search.

Je n’ai pas réussi à le mettre à jour moi-même ☹

J’ai fait une appli similaire sur LeBonCoin et ils ont changé entièrement toutes les balises HTML il y a quelques semaines… C’est ça le problème de ne pas avoir une API directement supportée par l’éditeur du contenu !

Est-il possible de m’aider à résoudre ce problème ?

Merci !

Cordialement,

Arthur VEISSEIRE | Data Scientist | Sogeti High Tech Phone +33 (0) 4 84 93 46 81 | Mobile +33 (0)6 76 40 70 56 545 Rue Pierre Berthier – CS 40 514 | 13 593 Aix-en-Provence Cedex 3 | France www.sogeti-hightech.frhttp://www.sogeti-hightech.fr/ / @SogetiHighTech1https://twitter.com/SogetiHighTech1/ www.sogeti.comhttp://www.fr.sogeti.com/ [cid:image002.png@01D34903.42741630]

De : Duccio Aiazzi notifications@github.com Envoyé : mercredi 4 juillet 2018 12:13 À : duccioa/python-seloger Cc : VEISSEIRE, Arthur; Comment Objet : Re: [duccioa/python-seloger] research executed but parsing of the result page crashing (#1)

By the way, what is the line that causes the error? Did you try to run just

search_criteria = {'cp': '75015', 'idtypebien': '1', 'pxmax': '500000', 'surfacemin': '40','tri': 'd_dt_crea', 'nb_balconsmin': '1'}

rent_paris = SeLogerAchat(search_criteria)

Without the show_search_filters()?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/duccioa/python-seloger/issues/1#issuecomment-402433777, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AkI3w235HB5fvF5vRP-EiZiet_ni6bYhks5uDJUdgaJpZM4UF_H6.

duccioa commented 6 years ago

I had a look but I couldn't find a quick solution. I am afraid I have really no time to fix this. I suggest you have a look at scrapy, the Python module for scraping. Although at the beginning it looks a bit over complicated, it is actually pretty easy to use and makes things much easier to fix when the pages changes. It also takes care of a lot of other stuff that you might need when you do serious scraping.