cumbucadev / cinemaempoa

Site que agrega filmes em cartaz em algumas das diversas salas de cinema de Porto Alegre.
https://cinemaempoa.com.br
27 stars 14 forks source link

Erro ao raspar filmes no CineBancarios : problema de nome de imagem muito grande #82

Closed guites closed 2 weeks ago

guites commented 2 weeks ago

Na hora de rodar o scrapping nessa postagem : http://cinebancarios.blogspot.com/2024/10/animacao-infantil-placa-mae-e-longa-de.html

está ocorrendo o seguinte erro:

[2024-10-03 18:21:28,208] ERROR in app: Exception on /screening/scrap [POST]
Traceback (most recent call last):
  File "/app/flask_backend/service/screening.py", line 63, in save_image
    return upload_image_to_api(app, file)
  File "/app/flask_backend/service/upload.py", line 19, in upload_image_to_api
    res.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api.imgbb.com/1/upload

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/flask_backend/service/upload.py", line 37, in upload_image_to_local_disk
    file.save(img_savepath)
AttributeError: '_io.BytesIO' object has no attribute 'save'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
  File "/app/flask_backend/routes/auth.py", line 102, in wrapped_view
    return view(**kwargs)
  File "/app/flask_backend/routes/screening.py", line 371, in runScrap
    created_features = import_scrapped_results(scrapped_results, current_app)
  File "/app/flask_backend/service/screening.py", line 162, in import_scrapped_results
    image_filename, image_width, image_height = save_image(
  File "/app/flask_backend/service/screening.py", line 67, in save_image
    return upload_image_to_local_disk(file, app, filename)
  File "/app/flask_backend/service/upload.py", line 39, in upload_image_to_local_disk
    with open(img_savepath, "wb") as f:
OSError: [Errno 36] File name too long: '/app/flask_backend/uploads/b05008fdbb87e9e1026538b1a49e2cfa.3permmsgidmsg-ar-1211843681118208125th1922ae48f7f42813viewfimgfuripszs0-l75-ftattbidANGjdJ_7ZZXwSJuZsqc6plLS-TJ0dCSawmvlf2rJ9tAcLemtW9_vXiVQxroxK6-xpSaXXDd81jah3dXnM7CUeH5x1OKv72i26aH1EF6GT10vkorBCDJcj-yo0t5vuaYdispembrealattidii_m1ibhg2v2'
172.19.0.4 - - [03/Oct/2024 18:21:28] "POST /screening/scrap HTTP/1.0" 500 -

notei que as imagens do posts estavam quebradas, daí o scrapper pegava o src delas nesse formato:

https://mail.google.com/mail/u/0?ui=2&ik=996493b366&attid=0.0.3&permmsgid=msg-a:r-1211843681118208125&th=1922ae48f7f42813&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ_7ZZXwSJuZsqc6plLS-TJ0dCSawmvlf2rJ9tAcLemtW9_vXiVQxroxK6-xpSaXXDd81jah3dXnM7CUeH5x1OKv72i26aH1EF6GT10vkorBCDJcj-yo0t5vuaY&disp=emb&realattid=ii_m1ibhg2v2&quot .

Essa URL retorna um HTTP 200, mas o conteúdo dela é um HTML. Como a API do imgbb não aceita ela (pq não é uma imagem válida), a gente tenta salvar ela localmente no servidor.

Daí ocorre algum problema na função que gera o nome do arquivo (ver download_image_from_url em flask_backend/service/screening.py), e fica um nome muito grande..

guites commented 2 weeks ago

Trabalhando nisso

guites commented 2 weeks ago

Resolvido via #83