Closed kesm closed 1 year ago
Hello,
Sorry, i should have be more explicit about the python version: you should use at least version 3.10. I've updated the README to add this information.
Keep me updated if that solve your problem :)
Hello again :)
Still not working with python3 but error is different :
root@Fabrice:/mnt/user/appdata/BedethequeKomga# python3 refreshMetadata.py
Traceback (most recent call last):
File "/mnt/user/appdata/BedethequeKomga/refreshMetadata.py", line 4, in <module>
from bedethequeApi import bedethequeApiProxies, find_series_url, get_comic_series_metadata, \
File "/mnt/user/appdata/BedethequeKomga/bedethequeApi.py", line 174
match block.text.strip():
^
SyntaxError: invalid syntax
Can you give me your full python version please? python --version
Sorry, didn't check my version, I currently have 3.9.10, will see to update it
Everything is fine after updating to 3.11.2, thanks for your work !
Sorry talk too quickly. Tried with one serie and everything went fine. Tried on all my collection (106 series) but got this error after some albums correctly updated
PS C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga> python .\refreshMetadata.py 2023-02-27 22:23:13,168 - root - INFO - refreshMetadata.py - Starting refresh in all series mode 2023-02-27 22:23:13,668 - root - INFO - refreshMetadata.py - proxys retrieved Traceback (most recent call last): File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\refreshMetadata.py", line 155, in <module> refresh_metadata() File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\refreshMetadata.py", line 41, in refresh_metadata bedetheque_metadata = get_comic_series_metadata(serie_url, proxy = proxy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\bedethequeApi.py", line 134, in get_comic_series_metadata title = soup.find("div", class_="bandeau-info serie").h1.text.strip() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'h1'
Edit : Error occured first time when it was updating manga Bleach
I must say, I didn't design it for manga but BD, so maybe there's some specific fields I didn't check. I'll have look tomorrow if I can
It seems the script tried directly to retrieve the metadata without looking for the url first, so the url of the series must already be set in komga. Can you give me the exact url stored in komga for this manga serie please?
I just tried with bds or comics in KOMGA_LIBRARY_LIST and the same happened, I wonder if it's not all the proxys urls that have been skipped?
Url for this manga is https://www.bedetheque.com/serie-7624-BD-Bleach.html
Oh, got my answer :
Vous utilisez sans doute un programme qui scanne la bedetheque. Votre IP a ete bloquee pour preserver les ressources du serveur, car ce genre de script penalise l'ensemble des utilisateurs du site. Vous pouvez utiliser gratuitement [BDGest\' Online](https://online.bdgest.com/) pour gerer votre collection de BD, comics et mangas. Pour plus d'information, contactez [info@bdgest.com](mailto:info@bdgest.com?subject=Bloquage%20de%20l%27IP%xxx).
This is what I got when I try to reach the url
Strange. With the script, it will use a list of proxies. And if there's an issue with a proxy, it will skip it. But never tries direct access without confirmation first !
So I don't get why your IP is blocked, or why the script fail while using a proxy..
Nonetheless, I will add an additional check to not generate this traceback and let the script try to continue
Thanks, will try later to see if my ip is still banned
I'll give you an update once I've updated the script tomorrow :)
Hi @kesm, I've made a small update to the script to "fail saifely" when there's an incorrect content returned from bedetheque, and to continue to parse the others series / comics book. i just tried on my side, and it works perfectly with the proxies retrieved automaticaly.
Hi, just tried the update, every proxy failed (there was a lot of proxies) then I got an error message : `2023-02-28 22:03:20,404 - root - WARNING - refreshMetadata.py - Failed to get page with the current proxy : {'http': 'http://xxx'}, removing it and trying with the next one Traceback (most recent call last): File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\bedethequeApi.py", line 97, in get_soup page = session.get(url, proxies=currentProxy, timeout=5) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\kesm\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 600, in get return self.request("GET", url, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\kesm\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 573, in request prep = self.prepare_request(req) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\kesm\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 484, in prepare_request p.prepare( File "C:\Users\kesm\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\models.py", line 368, in prepare self.prepare_url(url, params) File "C:\Users\kesm\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\models.py", line 439, in prepare_url raise MissingSchema( requests.exceptions.MissingSchema: Invalid URL 'None': No scheme supplied. Perhaps you meant https://None?
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\kesm\OneDrive\Documents\GitHub\BedethequeKomga\refreshMetadata.py", line 155, in
IndexError: list index out of range`
I'm still banned so maybe linked but as it should use proxy I don't think so
It is very strange... I just pushed an update for when the list of proxies runs out.
But that doesn't fix why the script tried all the proxies, and all failed..
Can you please try to add (after updating the script) the following line to bedetheque.api, between lines 140 and 141, run again refreshMetadata.py, and give me the results after the first fail?
logger.error("proxy = %s, url = %s, soup = %s", proxy, url, soup)
Here is the log :
2023-02-28 23:36:32,910 - root - INFO - refreshMetadata.py - proxys retrieved 2023-02-28 23:36:33,023 - root - ERROR - refreshMetadata.py - proxy = <bedethequeApi.bedethequeApiProxies object at 0x0000021553BA4650>, url = https://www.bedetheque.com/serie-59-BD-Asterix.html, soup = Vous utilisez sans doute un programme qui scanne la bedetheque. Votre IP a ete bloquee pour preserver les ressources du serveur, car ce genre de script penalise l'ensemble des utilisateurs du site.<br/> Vous pouvez utiliser gratuitement <a href="https://online.bdgest.com/">BDGest\' Online</a> pour gerer votre collection de BD, comics et mangas.<br/> Pour plus d'information, contactez <a href="mailto:info@bdgest.com?subject=Bloquage de l'IP : myip'">info@bdgest.com</a>. 2023-02-28 23:36:33,025 - root - ERROR - refreshMetadata.py - Error reading url https://www.bedetheque.com/serie-59-BD-Asterix.html 2023-02-28 23:36:33,025 - root - WARNING - refreshMetadata.py - Incorrect URL found for Astérix, trying to look for the URL 2023-02-28 23:36:33,025 - root - INFO - refreshMetadata.py - No url in komga for serie Astérix, searching bedetheque by name 2023-02-28 23:36:33,114 - root - WARNING - refreshMetadata.py - Astérix not found on bedetheque 2023-02-28 23:36:33,114 - root - WARNING - refreshMetadata.py - No URL found for Astérix, skipping metadata refresh for this serie 2023-02-28 23:36:34,385 - root - ERROR - refreshMetadata.py - Error reading url https://www.bedetheque.com/BD-Asterix-Tome-1-Asterix-le-gaulois-22940.html 2023-02-28 23:36:34,385 - root - WARNING - refreshMetadata.py - Incorrect URL found for Astérix le gaulois, trying to look for the URL 2023-02-28 23:36:34,385 - root - INFO - refreshMetadata.py - No url in komga for tome Astérix le gaulois, searching bedetheque by name 2023-02-28 23:36:34,386 - root - WARNING - refreshMetadata.py - Failed to get page with the current proxy : {'http': 'http://xxx'}, removing it and trying with the next one 2023-02-28 23:36:34,387 - root - WARNING - refreshMetadata.py - proxy {'http': 'http://xxx'} dooesn't work, removing it and trying the next one 2
Okay, i think I found why.
Even though a proxy is given, the ip shown is your own, and not the proxy one. I need to investigate on my side, i'll keep you updated
Okay, found the error, fixed it. Please update bedethequeApi.py to the last version.
And also, before launching the script, please do : pip install pysocks
Let me know if that works
For information : many proxies will fail at the beginning, it's "normal". The script retrieve a list of supposed active proxies, and then filters them. so at the beginning, faulty proxies will be identified and removed. Then, after this "filtering", the script should run smoothly
works now, thanks for your time! Glad to see my Komga with correct informations :)
Hello,
I just installed this script on Unraid and I have this error when I launch refreshMetadata.py using python 2.7
Here is the content of my config.py
I also tried without KOMGA_SERIE_LIST, with content of serie list in collection list, with nothing in optionnal field but I have the same issue everytime