Inervo / BedethequeKomga

A Metadata Provider for Komga using Bedetheque
15 stars 2 forks source link

[REQ] Timer instead of Proxies and Options #2

Closed LeXxPub closed 1 year ago

LeXxPub commented 1 year ago

Hi, thanks for the job, it's a really good script but going through proxies fail a lot for me...

I think maybe it can be cool if you can setup an option with timer between request (like comicRack scrapper), to protect our ip to be ban. Ok it'll take more time but if you have hudge library, it can be useful.

Other idea is to let the user choose to update only series and not books, personnaly i want to use your script for the status of my series, the rest of metadata are already set!

The last feature i see for now should be to find a way to not scan all the books if already scanned by the script. For example use database to skip scan if already scanned less than 3 month ago... But let option to do force-scan... it'll help to save time!

Thanks!

Inervo commented 1 year ago

Hi,

The failed proxy at first is the normal behavior. The script retrieve the free proxies from an online source, and then tries to retrieve the metadata with the next available proxy. If the timeout (5s from memory) is reached, the proxy is removed, and the next one is tried. So after some trial and error, only the reliable remains, but it takes a bit of time.

I think maybe it can be cool if you can setup an option with timer between request (like comicRack scrapper), to protect our ip to be ban. Ok it'll take more time but if you have hudge library, it can be useful.

That a good idea. I'll try to implement this in the future (for now, i'm full with work and personal plans, so it won't be in the next days). However, for huge library, it's better to use the proxies (at first, it'll be slow, but after some minutes removing the faulty proxies, it'll be wayyyyyy faster than implement a timer between each parse).

Other idea is to let the user choose to update only series and not books, personnaly i want to use your script for the status of my series, the rest of metadata are already set!

I'll add this option later :)

The last feature i see for now should be to find a way to not scan all the books if already scanned by the script. For example use database to skip scan if already scanned less than 3 month ago... But let option to do force-scan... it'll help to save time!

I see the idea and reason behind, but this is a lot of work that i don't think i'll implement (not by myself at least).

Inervo commented 1 year ago

Well, i did find some time this evening. Features added, except for the last one that I don't think i'll implement.