Casvt / Kapowarr

Kapowarr is a software to build and manage a comic book library, fitting in the *arr suite of software.
https://casvt.github.io/Kapowarr/
GNU General Public License v3.0
410 stars 15 forks source link

Add emule/amule support #178

Open janaxhell opened 1 week ago

janaxhell commented 1 week ago

Despite the mainstream attention that torrent and usenet get, in Europe (and South America AFAIK) emule is still a BIG thing. With the diffusion of fiber, it's the most secure and moderately quick tool to download anything. So among the usual movies and series, comics, especially in Italy, are basically released ONLY to emule (I mean 100% of comics), then a limited part of that is cross-seeded to very short-living torrents. But on emule they live on and on, and if not, most of them can be asked for a reshare on the main italian hub, ddunlimited.net, which exists since the times of edonkey, if you are old enough to remember. The problem of emule is that you cannot automate it, you are forced either to check the release pages OR launch a manual search for each series. I've been using Mylar for US comics because those are the only I can find on torrent/usenet/getcomics, but I must manually download all the italian (or french, spanish, dutch) from emule. If Kapowarr added emule/amule support, it would be a killer application for non-US users.

To give you an idea, each series is well sorted image

and each issue has its verified ed2k link image

It would be a matter of monitoring the desired pages for new links.

There is a catch: many non-US comics are not present on ComicVine. For those series there should be something like a blind-scan: the user sets the series to be monitored and Kapowarr just gets and adds it to the library as-is.

Casvt commented 1 week ago

So it's a website similar to getcomics, but for international comics? And the links in the second image, are they direct download links (clicking them downloads them in the browser)? Is it possible to search on the website? Can I get a link to the website? I can build a scraper similar to the one used for getcomics...

janaxhell commented 1 week ago

You need to create a free account to access the links, the site is ddunlimited.net Then go to the Biblioteca sub-forum, where Comics, Manga, Books are stored https://ddunlimited.net/viewforum.php?f=909&sid=4cc30e5e71ddf4a129b9d89bcc1d3831 The links are not direct to the browser, they are direct to emule, they use ed2k protocol, when you click them, they are added to emule (or amule), they look like this:

ed2k://|file|Julia.Speciale.01.-..Il.caso.del.convegno.insanguinato.(SBE.2015-08).[c2c.found].cbr|43348457|C69A9D5FE0875BBC683BCA14AE8D1B20|h=KS7IEZR7LSRW7DX6PPHCPKV36OY7DSYH|/

Casvt commented 1 week ago

So I need to require users to make an account, I need to make a web scraper, a new download client for a rare protocol, and all for international comics that aren't even guaranteed to be on CV?

Doesn't sound like it's worth the effort...

janaxhell commented 1 week ago

The protocol is not rare, it's just as old as torrent and very much used in Europe. While many non-US comics are not on ComicVine, most of the mainstream are. I haven't tried Kapowarr, but any other *arr uses external clients to download, like sab or qbit. Amule is open source, what is the problem to add it? Also, the account is obviously to avoid the links to be directly linked in Google.

Casvt commented 5 days ago

Do you think this page is a good way to find series on the website?

https://ddunlimited.net/viewtopic.php?p=10000038

Then once we've found the right series, search on that page for the right download. This avoids the search rate limit that the website has. I also have the logging in working, so together with this we'd have the searching done. Then check if aMule has an API and we should have all the information we need to make this work.

Also, could you give a volume that is both on CV and has downloads on ddunlimited? Helps with testing stuff out.

janaxhell commented 5 days ago

Hi! The one you linked is only one of the 4 main libraries, check this pic

image

Fumettografie = just a redundant library were characters or similar are collected from other libraries Fumetti = Original Italian comics https://ddunlimited.net/viewtopic.php?t=3639202 Comics = US comics translated to italian https://ddunlimited.net/viewtopic.php?t=3639235 Manga = Manga translated to italian https://ddunlimited.net/viewtopic.php?t=3648550 Fumetti in Lingua Straniera = Any comic in its original language, although a very small section. https://ddunlimited.net/viewtopic.php?t=3827497 Anime di Carta = not a library, request section

So if you only search in Comics, only US comics translated in italian will be found. You should link at least Fumetti/Comics/Manga. Each of them has a "Lista Comics/Fumetti/Manga in Sezione" page where all the series of that library is indexed.

Fumetti section This, and basically all Sergio Bonelli Editore comics, is listed on ComicVine:

https://ddunlimited.net/viewtopic.php?t=3821676 https://comicvine.gamespot.com/dampyr/4050-29465/

Manga section This is Akira manga translated to Italian

https://ddunlimited.net/viewtopic.php?t=191929 https://comicvine.gamespot.com/akira/4050-131472/

About Comics section, keep in mind that US comics outside US are often collected into anthology volumes, with 2 or 3 series inside, so there is no direct relation or numbering. For instance, this Spider-man series depending on the period, contained a second series of Hulk, Iron Man, or something else inside. Anyway it's indexed on CV:

https://ddunlimited.net/viewtopic.php?t=3420747 https://comicvine.gamespot.com/luomo-ragno/4050-47785/

Casvt commented 4 days ago

I've written a basic version of a ddunlimited client that works well and fast. It logs in, finds the series and extracts the ed2k download links. Only thing left is a client that interacts with the download client. The only clients I can find online are amule and emule. I cannot find anything about an API though, which I need to control the clients. Assuming you have one of them installed, can you tell me if you have a web interface or that it's a direct application (something that you had to install on your PC using an installer)?

Also, do you happen to know why often the filenames include a number at the end? It's messing up the algorithm that extracts data from the filename. An example is: Absolute Moebius 07 - Il fallico folle (Panini Comics 2012) [c2c Lux73 pipulus] 2.0.cbr. What is that 2.0 doing there at the end? My algorithm now thinks that the issue number is 2.0 instead of 07.

Client Code ```python from asyncio import create_task, gather, run from time import time from typing import Dict, List from urllib.parse import unquote_plus from aiohttp import ClientSession, FormData from bs4 import BeautifulSoup USERNAME = "" PASSWORD = "" SEARCH_TERM = "L'Uomo Ragno (Editoriale Corno)" dd_base = 'https://ddunlimited.net' class DDUnlimited: def __init__(self, session: ClientSession) -> None: self.session = session return async def login(self, username: str, password: str) -> None: form = FormData() form.add_field('username', username) form.add_field('password', password) form.add_field('redirect', 'index.php') form.add_field('login', 'Login') r = await self.session.post( f'{dd_base}/ucp.php', params={ 'mode': 'login' }, data=form ) if "/ucp.php" in str(r.url): raise ValueError("Login failed") return async def __fetch_topic(self, topic_id: str) -> str: async with self.session.get( f'{dd_base}/viewtopic.php?t={topic_id}' ) as response: return await response.text() async def _get_series_list(self) -> Dict[str, str]: if ( not hasattr(self, 'series_list') or (self.series_fetch_time + 86400) < time() ): tasks = [ create_task(self.__fetch_topic(t)) for t in ('3639202', '3639235') ] pages = await gather(*tasks) self.series_list = {} for page in pages: soup = BeautifulSoup(page, 'html.parser') for r in soup.select('a.postlink-local'): self.series_list[r.get_text()] = r["href"].split( # type: ignore 't=' )[-1] self.series_fetch_time = time() return self.series_list async def search(self, query: str) -> List[str]: series_list = await self._get_series_list() # ============== # This matching should be improved # Currently requires exact match # Should instead be a matching algorithm # ============== if query not in series_list: return [] series_output = BeautifulSoup( await self.__fetch_topic(series_list[query]), 'html.parser' ) results = [ unquote_plus(a['href']) # type: ignore for a in series_output.select('a[href^="ed2k://|file|"]') ] return results async def get_ddunlimited_search_results( username: str, password: str, query: str ) -> List[str]: async with ClientSession(headers={'User-Agent': 'Kapowarr'}) as session: dd = DDUnlimited(session) await dd.login(username, password) return await dd.search(query) results = run(get_ddunlimited_search_results( USERNAME, PASSWORD, SEARCH_TERM )) for r in results: print() print(r) ```
janaxhell commented 4 days ago

On my docker server I have amule, which is the *nix version and on Windows I have emule. The one on Windows is obviously an .exe (with or without installer) What happens on WIndows is that when I click an ed2k link, it directly opens in emule OR I can copy and paste it in the client.

For the one on docker, this is my compose as an example

---
version: "2.1"
services:
  amule:
    image: ngosang/amule
    container_name: amule
    environment:
      - PUID=998
      - PGID=100
      - TZ=Europe/Rome
      # GUI_PWD and WEBUI_PWD are only used in initial setup
      - GUI_PWD=XXXXXXXXX
      - WEBUI_PWD=XXXXXXXXXXX
      - MOD_AUTO_RESTART_ENABLED=true
      - MOD_AUTO_RESTART_CRON=0 6 * * *
      - MOD_AUTO_SHARE_ENABLED=false
      - MOD_AUTO_SHARE_DIRECTORIES=/incoming;/my_movies
      - MOD_FIX_KAD_GRAPH_ENABLED=true
    ports:
      - "4711:4711" # web ui
      - "4712:4712" # remote gui, webserver, cmd ...
      - "4662:4662" # ed2k tcp
      - "4665:4665/udp" # ed2k global search udp (tcp port +3)
      - "4672:4672/udp" # ed2k udp
    volumes:
      - /srv/dev-disk-by-uuid-5b67514d-485e-4306-873e-b1cbb54ccf99/Config/aMule:/home/amule/.aMule
      - /srv/dev-disk-by-uuid-A870CA6B70CA3FB4/data/emule:/incoming
      - /srv/dev-disk-by-uuid-A870CA6B70CA3FB4/data/emule/incomplete:/temp
    restart: always

I don't know about an API, I think you should look for ED2K protocol, I don't know if this is useful: https://en.wikipedia.org/wiki/Ed2k_URI_scheme

About the numbers at the end of the filenames: unfortunately metadata tagging is not a thing in EU, so most of scanners put info in the filename. In that case 2.0, 2.1, etc is the revision number of the scan. Sometimes a user notices that a comic is missing a page or some other problem and the scanner fixes it re-releasing a new version. The other numbers in the ed2k link are size and hash: with a hash the file can be renamed in any way, but emule will always recognize it from the unique hash. no matter what. You must modify the file itself to change the hash I.e. re-zipping it). This way a comic 2.0.cbr will never be confused with comic 2.2.cbr by emule/amule. In case you already have that file, it will be ignored, if you already have downloaded it and removed, it will ask you if you want to re-download, because it remembers every file you get (if you set it so). To explain more clearly: when you download a torrent and rename it and after some time decide to reseed it, you must rename it exactly as it was at the beginning. With emule this is not needed, emule will check the hash and assign that file to the ID it knows and join other clients that already seed that file. So you can reshare a file 10 years after it was released and all emule clients will recognized it.

EDIT I've found a couple of links that may be interesting https://github.com/a-pavlov/jed2k https://www.reddit.com/r/PyMedusa/comments/1b0susc/can_i_give_support_to_emuleed2k_in_pymedusa/ https://www.freshports.org/net-p2p/py-ed2k-tools/ https://ed2k-tools.sourceforge.net/python.shtml https://github.com/lightrabbit/emule-REST-API/blob/master/ED2KLink.cpp

Casvt commented 4 days ago

Well you're in luck, looks like amule has a API and web-UI built in. The UI you use with your Docker container is just a skin over the standard UI that is included. I've spun up an instance of myself and I can see the API calls being made, which means that I can make a client for it.

I know about the rest of the ed2k format and it's meanings (hash, etc.). It was purely about the apparent revision number.

It's going to be some time before I'm going to implement this into Kapowarr, but it looks like it is going to happen.

janaxhell commented 4 days ago

Super! That is GREAT news!! Thank you!!