Hamuko / cum

comic updater, mangafied
Apache License 2.0
170 stars 15 forks source link

Feature Request: Search for a manga via APIs/GET #62

Open LaserEyess opened 5 years ago

LaserEyess commented 5 years ago

As far as I can tell, to follow a manga you would currently have to go to madokami/mangadex/etc and manually grab the series URL in order to do cum follow $url. It would be nice to have the ability to search the download sites (where applicable) in order to find series to follow. This might be a niche usecase as it really isn't that hard to open a browser and search, but if this is a command line tool modeled after package managers, then I think a search ability would be within the scope considering many package managers have the ability to search for packages.

As an idea of what I'm thinking of:

$ cum search "Dragonball" 
==> Searching for "Dragonball"
Found: 
(1) /Manga/D/DR/DRAG/Dragon Ball
(2) /Raws/Dragonball
(3) /Manga/D/DR/DRAG/Dragon Ball Super
(4) /Manga/_Doujinshi/Dragonball
(5) /Manga/_Doujinshi/Dragonball/Dragon Ball Z - Legendary Vegeta (Doujinshi)

Maybe for example you cut it off at 5 or so results, and the user can refine it to be more specific if they can't find what they're looking for (or ultimately just search directly/google it). You could even potentially have them select which ones to follow based on the the number.

I'm sure this would be a large amount of work, but perhaps a way to implement it would be a new scraper class, BaseSearch, that would handle both requesting and parsing search parameters, either through an API (if it exists) or just a simple GET with a generated URI for search. And then you'd have to update all the scrapers with search parsing. Not sure if that's the best way to do it, but from looking at the code a bit I think it would be feasible to do it this way.

Hamuko commented 5 years ago

This was actually something that I planned during the early drafts of cum, but I eventually gave up on it. I tried to do it with Batoto, but when you searched for something short and simple like "Saki", you got hit with a wall of search results in no relevance order and you had to keep scraping page after page in order to get all of them. In the end I decided to abandon the idea since I didn't find it that important.

LaserEyess commented 5 years ago

For most cases, I think it would be the responsibility of the user to refine their own searches if they get a bunch of garbage. For simpler cases like you mentioned, that's a limitation of the search capabilities provided by the sites, and it wouldn't be cum's fault. I think those simple cases are in the minority but I could be proven wrong.

Hamuko commented 5 years ago

For most cases, I think it would be the responsibility of the user to refine their own searches if they get a bunch of garbage.

The problem with this is that I was looking for "Saki" the manga, but I was getting everything and the kitchen sink. In order to get good search results out of what I got out of Batoto during my testing, I would have had to fetch all of the results from all of the pages and perform some sort of string matching logic in order to rank them.

Of course, there's no Batoto support anymore and I haven't really played around with the idea with any of the current scrapers.

LaserEyess commented 5 years ago

I understand what you meant by searching for "Saki". Manga that is simply a common name or a common word would only be searchable if the site itself would return reasonable results for that search term. As for the currently supported sites, searching "Saki" returns a bunch of garbage, but that really is their search's fault. Places like mangadex have advanced parameters like 'author' that could be used to refine them, although that makes implementing search more complicated.

I never thought of cum itself taking the results and ranking them or filtering them, but that could be a potential workaround in certain cirucmstances. For example, searching "Saki" on madokami returns some results that don't even have the word "Saki" in the name, so you could throw those away. Of course, even removing those entries, the actual manga "Saki" is like the 40th result. But my suggestion isn't to make cum the search engine, but rather use the sites themselves as the search engine and parse the results that the give as simply listing them on the command line and giving the option to follow one or more of those results. I know it will give garbage results in simple cases, but I think those are rare.