CovidTrackerFr / vitemadose

Détection de créneaux de vaccination disponibles pour l'outil ViteMaDose
GNU General Public License v3.0
232 stars 56 forks source link

Exception non géré #550

Closed Olivier4477 closed 3 years ago

Olivier4477 commented 3 years ago

Bonjour,

Sauf erreur de ma part, exception non géré:

2021-06-11 08:21:46,939 | [INFO] Found 3176 Doctolib centers (external scraper).
2021-06-11 08:21:47,028 | [WARNING] Exception lors du traitement du centre d241037 ERREUR DE SCRAPPING (Doctolib): Le centre est un doublon https://www.doctolib.fr/vaccination-covid-19/saint-malo-et-dinan/gh-rance-emeraude-vaccination-covid?pid=practice-163095
2021-06-11 08:21:47,028 | [INFO]  d241037 None             Erreur                           35    
Process Process-33:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/builds/ViteMaDose/vitemadose/scraper/scraper.py", line 118, in export_by_creneau
    exporter.export(q_iter(creneaux_q))
  File "/builds/ViteMaDose/vitemadose/scraper/export/export_v2.py", line 34, in export
    for creneau in creneaux:
  File "/builds/ViteMaDose/vitemadose/utils/vmd_utils.py", line 309, in get
    next_bulk = self.q.get()
  File "<string>", line 2, in get
  File "/usr/local/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod
    kind, result = conn.recv()
  File "/usr/local/lib/python3.8/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/usr/local/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/local/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
 ----- EXPORTER RUNNING IN PROCESS 80 ------
Traceback (most recent call last):
  File "scrape.py", line 7, in <module>
    main()
  File "/builds/ViteMaDose/vitemadose/scraper/main.py", line 26, in main
    scrape(platforms=platforms)
  File "/builds/ViteMaDose/vitemadose/scraper/scraper.py", line 76, in scrape
    centres_cherchés = get_last_scans(centres_cherchés)
  File "/builds/ViteMaDose/vitemadose/utils/vmd_utils.py", line 227, in get_last_scans
    for centre in centres:
  File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
requests.exceptions.ConnectionError: None: Max retries exceeded with url: /resources/lieux-de-vaccination-contre-la-covid-19/20210611-100516/centres-vaccination.csv (Caused by None)
Cleaning up file based variables
00:01
ERROR: Job failed: exit code 1

Pipeline -> https://gitlab.com/ViteMaDose/vitemadose/-/jobs/1338627766

damien57 sur Mattermost

Floby commented 3 years ago

Yes, attention, l'error EOFError est une conséquence de la véritable erreur

requests.exceptions.ConnectionError: None: Max retries exceeded with url: /resources/lieux-de-vaccination-contre-la-covid-19/20210611-100516/centres-vaccination.csv (Caused by None)
Cleaning up file based variables

Comme le processus principal foire et s'éteint, il ferme le files de communications avec les différents processus démarrés ce qui donne des EOFError dans ces processus.

De manière générale il faudrait faire une passe sur les centre_iterator pour vérifier que les erreurs sont gérées

grubounet commented 3 years ago

@aureliancnx c'est encore le csv data gouv qui merdouille :(

Olivier4477 commented 3 years ago

Pourquoi ne pas le chopper 1x / jour, l'afficher sur github et le scraper le charge via github ? 1x/jour ou toute les 2, 3 ,6 heures par exemple

Le ven. 11 juin 2021 à 12:18, grubounet @.***> a écrit :

@aureliancnx https://github.com/aureliancnx c'est encore le csv data gouv qui merdouille :(

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CovidTrackerFr/vitemadose/issues/550#issuecomment-859475210, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQI7DYJVEXODJHLZDPW5VPLTSHPI3ANCNFSM46QHIBJA .

aureliancnx commented 3 years ago

Pourquoi ne pas le chopper 1x / jour, l'afficher sur github et le scraper le charge via github ? 1x/jour ou toute les 2, 3 ,6 heures par exemple Le ven. 11 juin 2021 à 12:18, grubounet @.***> a écrit : @aureliancnx https://github.com/aureliancnx c'est encore le csv data gouv qui merdouille :( — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#550 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQI7DYJVEXODJHLZDPW5VPLTSHPI3ANCNFSM46QHIBJA .

Hello,

J'ai fait une issue liée à ce problème : https://github.com/CovidTrackerFr/vitemadose/issues/557