Inervo / BedethequeKomga

A Metadata Provider for Komga using Bedetheque
15 stars 2 forks source link

Error when I launch python refreshMetadata.py #1

Closed kesm closed 1 year ago

kesm commented 1 year ago

Hello,

I just installed this script on Unraid and I have this error when I launch refreshMetadata.py using python 2.7

root@Fabrice:/mnt/user/appdata/BedethequeKomga# python refreshMetadata.py 
  File "refreshMetadata.py", line 21
    if not (serie_name := serie['metadata']['title']):
                       ^
SyntaxError: invalid syntax

Here is the content of my config.py

# Mandatory fields - Komga informations
KOMGA_BASE_URL = "https://my-komga-dns"
KOMGA_EMAIL = "EMAIL@EMAIL.COM"
KOMGA_EMAIL_PASSWORD = "PASSWORD"

# Mandatory fields - Status of the series to update
# By default, 'ONGOING', 'HIATUS' to limit the requests to Bedetheque
KOMGA_STATUS = ['ONGOING', 'HIATUS', 'ABANDONED' , 'ENDED'] # Accept ONGOING HIATUS ABANDONED and ENDED

# Optionnal fields. If empty, will refresh all the books on komga
# Only on these 2 fields can be completed. Of both are, it will generate an error
KOMGA_LIBRARY_LIST = [] # retrieve library value from library URL in Komga
KOMGA_COLLECTION_LIST = [] # retrieve collection value from collection URL in Komga
KOMGA_SERIE_LIST = ['0ASNDRAYWD9WS']

I also tried without KOMGA_SERIE_LIST, with content of serie list in collection list, with nothing in optionnal field but I have the same issue everytime

Inervo commented 1 year ago

Hello,

Sorry, i should have be more explicit about the python version: you should use at least version 3.10. I've updated the README to add this information.

Keep me updated if that solve your problem :)

kesm commented 1 year ago

Hello again :)

Still not working with python3 but error is different :

root@Fabrice:/mnt/user/appdata/BedethequeKomga# python3 refreshMetadata.py
Traceback (most recent call last):
  File "/mnt/user/appdata/BedethequeKomga/refreshMetadata.py", line 4, in <module>
    from bedethequeApi import bedethequeApiProxies, find_series_url, get_comic_series_metadata, \
  File "/mnt/user/appdata/BedethequeKomga/bedethequeApi.py", line 174
    match block.text.strip():
          ^
SyntaxError: invalid syntax
Inervo commented 1 year ago

Can you give me your full python version please? python --version

kesm commented 1 year ago

Sorry, didn't check my version, I currently have 3.9.10, will see to update it

kesm commented 1 year ago

Everything is fine after updating to 3.11.2, thanks for your work !

kesm commented 1 year ago

Sorry talk too quickly. Tried with one serie and everything went fine. Tried on all my collection (106 series) but got this error after some albums correctly updated

PS C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga> python .\refreshMetadata.py 2023-02-27 22:23:13,168 - root - INFO - refreshMetadata.py - Starting refresh in all series mode 2023-02-27 22:23:13,668 - root - INFO - refreshMetadata.py - proxys retrieved Traceback (most recent call last): File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\refreshMetadata.py", line 155, in <module> refresh_metadata() File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\refreshMetadata.py", line 41, in refresh_metadata bedetheque_metadata = get_comic_series_metadata(serie_url, proxy = proxy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\bedethequeApi.py", line 134, in get_comic_series_metadata title = soup.find("div", class_="bandeau-info serie").h1.text.strip() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'h1'

Edit : Error occured first time when it was updating manga Bleach

Inervo commented 1 year ago

I must say, I didn't design it for manga but BD, so maybe there's some specific fields I didn't check. I'll have look tomorrow if I can

Inervo commented 1 year ago

It seems the script tried directly to retrieve the metadata without looking for the url first, so the url of the series must already be set in komga. Can you give me the exact url stored in komga for this manga serie please?

kesm commented 1 year ago

I just tried with bds or comics in KOMGA_LIBRARY_LIST and the same happened, I wonder if it's not all the proxys urls that have been skipped?

Url for this manga is https://www.bedetheque.com/serie-7624-BD-Bleach.html

kesm commented 1 year ago

Oh, got my answer :

Vous utilisez sans doute un programme qui scanne la bedetheque. Votre IP a ete bloquee pour preserver les ressources du serveur, car ce genre de script penalise l'ensemble des utilisateurs du site. Vous pouvez utiliser gratuitement [BDGest\' Online](https://online.bdgest.com/) pour gerer votre collection de BD, comics et mangas. Pour plus d'information, contactez [info@bdgest.com](mailto:info@bdgest.com?subject=Bloquage%20de%20l%27IP%xxx).

This is what I got when I try to reach the url

Inervo commented 1 year ago

Strange. With the script, it will use a list of proxies. And if there's an issue with a proxy, it will skip it. But never tries direct access without confirmation first !

So I don't get why your IP is blocked, or why the script fail while using a proxy..

Inervo commented 1 year ago

Nonetheless, I will add an additional check to not generate this traceback and let the script try to continue

kesm commented 1 year ago

Thanks, will try later to see if my ip is still banned

Inervo commented 1 year ago

I'll give you an update once I've updated the script tomorrow :)

Inervo commented 1 year ago

Hi @kesm, I've made a small update to the script to "fail saifely" when there's an incorrect content returned from bedetheque, and to continue to parse the others series / comics book. i just tried on my side, and it works perfectly with the proxies retrieved automaticaly.

kesm commented 1 year ago

Hi, just tried the update, every proxy failed (there was a lot of proxies) then I got an error message : `2023-02-28 22:03:20,404 - root - WARNING - refreshMetadata.py - Failed to get page with the current proxy : {'http': 'http://xxx'}, removing it and trying with the next one Traceback (most recent call last): File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\bedethequeApi.py", line 97, in get_soup page = session.get(url, proxies=currentProxy, timeout=5) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\kesm\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 600, in get return self.request("GET", url, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\kesm\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 573, in request prep = self.prepare_request(req) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\kesm\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 484, in prepare_request p.prepare( File "C:\Users\kesm\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\models.py", line 368, in prepare self.prepare_url(url, params) File "C:\Users\kesm\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\models.py", line 439, in prepare_url raise MissingSchema( requests.exceptions.MissingSchema: Invalid URL 'None': No scheme supplied. Perhaps you meant https://None?

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\refreshMetadata.py", line 155, in refresh_metadata() File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\refreshMetadata.py", line 49, in refresh_metadata refresh_book_metadata(komga, serie_id, serie_url, proxy = proxy) File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\refreshMetadata.py", line 121, in refresh_book_metadata book_url = find_comic_url(book_name, book['metadata']['number'], serie_url, proxy = proxy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\bedethequeApi.py", line 47, in find_comic_url soup = get_soup(serie_url, proxy = proxy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\bedethequeApi.py", line 101, in get_soup currentProxy = proxy.removeProxyAndGetNew(currentProxy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\bedethequeApi.py", line 282, in removeProxyAndGetNew return self.proxies[self.proxyIndex]


IndexError: list index out of range`

I'm still banned so maybe linked but as it should use proxy I don't think so
Inervo commented 1 year ago

It is very strange... I just pushed an update for when the list of proxies runs out.

But that doesn't fix why the script tried all the proxies, and all failed..

Can you please try to add (after updating the script) the following line to bedetheque.api, between lines 140 and 141, run again refreshMetadata.py, and give me the results after the first fail? logger.error("proxy = %s, url = %s, soup = %s", proxy, url, soup)

kesm commented 1 year ago

Here is the log : 2023-02-28 23:36:32,910 - root - INFO - refreshMetadata.py - proxys retrieved 2023-02-28 23:36:33,023 - root - ERROR - refreshMetadata.py - proxy = <bedethequeApi.bedethequeApiProxies object at 0x0000021553BA4650>, url = https://www.bedetheque.com/serie-59-BD-Asterix.html, soup = Vous utilisez sans doute un programme qui scanne la bedetheque. Votre IP a ete bloquee pour preserver les ressources du serveur, car ce genre de script penalise l'ensemble des utilisateurs du site.<br/> Vous pouvez utiliser gratuitement <a href="https://online.bdgest.com/">BDGest\' Online</a> pour gerer votre collection de BD, comics et mangas.<br/> Pour plus d'information, contactez <a href="mailto:info@bdgest.com?subject=Bloquage de l'IP : myip'">info@bdgest.com</a>. 2023-02-28 23:36:33,025 - root - ERROR - refreshMetadata.py - Error reading url https://www.bedetheque.com/serie-59-BD-Asterix.html 2023-02-28 23:36:33,025 - root - WARNING - refreshMetadata.py - Incorrect URL found for Astérix, trying to look for the URL 2023-02-28 23:36:33,025 - root - INFO - refreshMetadata.py - No url in komga for serie Astérix, searching bedetheque by name 2023-02-28 23:36:33,114 - root - WARNING - refreshMetadata.py - Astérix not found on bedetheque 2023-02-28 23:36:33,114 - root - WARNING - refreshMetadata.py - No URL found for Astérix, skipping metadata refresh for this serie 2023-02-28 23:36:34,385 - root - ERROR - refreshMetadata.py - Error reading url https://www.bedetheque.com/BD-Asterix-Tome-1-Asterix-le-gaulois-22940.html 2023-02-28 23:36:34,385 - root - WARNING - refreshMetadata.py - Incorrect URL found for Astérix le gaulois, trying to look for the URL 2023-02-28 23:36:34,385 - root - INFO - refreshMetadata.py - No url in komga for tome Astérix le gaulois, searching bedetheque by name 2023-02-28 23:36:34,386 - root - WARNING - refreshMetadata.py - Failed to get page with the current proxy : {'http': 'http://xxx'}, removing it and trying with the next one 2023-02-28 23:36:34,387 - root - WARNING - refreshMetadata.py - proxy {'http': 'http://xxx'} dooesn't work, removing it and trying the next one 2

Inervo commented 1 year ago

Okay, i think I found why.

Even though a proxy is given, the ip shown is your own, and not the proxy one. I need to investigate on my side, i'll keep you updated

Inervo commented 1 year ago

Okay, found the error, fixed it. Please update bedethequeApi.py to the last version.

And also, before launching the script, please do : pip install pysocks

Let me know if that works

For information : many proxies will fail at the beginning, it's "normal". The script retrieve a list of supposed active proxies, and then filters them. so at the beginning, faulty proxies will be identified and removed. Then, after this "filtering", the script should run smoothly

kesm commented 1 year ago

works now, thanks for your time! Glad to see my Komga with correct informations :)