Difegue / LANraragi

Web application for archival and reading of manga/doujinshi. Lightweight and Docker-ready for NAS/servers.
https://lrr.tvc-16.science
MIT License
2.17k stars 153 forks source link

Add "Meta-Archives" collection of IDs #519

Open Difegue opened 2 years ago

Difegue commented 2 years ago

Unlike categories, which loosely regroup IDs, Meta-Archives should hide the IDs they contain and expose themselves instead.

Adding this new element raises a ton of questions relative to the current architecture:

CirnoT commented 2 years ago

(and maybe artist)

I'd hate if it would mean that they can't inherit artist from their children, as it would make them that much harder for grouping magazine one-shots.

Search should be modified to use a precalculated table containing "all archives", which would be a mix of real and meta-IDs. (Instead of the current behavior where it grabs all IDs off the Redis root)

I assume this means that it would show meta archives only and not their children, is that correct?

AbyssalMonkey commented 2 years ago

Unlike categories, which loosely regroup IDs, Meta-Archives should hide the IDs they contain and expose themselves instead.

I'm fine with this as long as there is a way to expose the child IDs and have the child IDs be able to be a part of multiple meta-archives. Certain magazines release weekly (looking at you kairakuten) and it would be a pain in the ass to have duplicate IDs for each of them. So one ID should be in both the weekly and monthly aggregate. This also goes for artists who rerelease their works in multiple magazines.

AbyssalMonkey commented 2 years ago

How to integrate them to the Search Engine?

For filtering purposes, could it be possible to search for only meta-archives, and by extension, IDs not included in them (negative search)?

Difegue commented 1 year ago

This is officially in progress thanks to #818. 🎊
Meta-Archives are now Tankoubons, and the creation/management APIs are now available in nightlies.

Remaining work to consider this issue closed:

EfronC commented 1 year ago

Thinking a little about the first 2 points: Are they really necessary? I mean, I think they will probably be very problematic to implement because:

Tanks, while being made mainly for Tankoubons, can be used for multiple types of aggroupations, which include non-related works, for example, imagine that someone have a few tanks called: "Pending to read", "Favorites", "Recommended works", etc. In this case you could have 1 same file in every one of them, and now, let's take the problem to the edge and think that, for some reason I wouldn't know why but it's possible with the current back, someone have 1 same file in 100 Tanks: What would happen when you search for the file in the search index? Will you show the 100 Tanks(Which would mess the view), or will you choose one of them and show it(Which would make it not work as expected, because for example, if you have a "Favorites" tank and a related works tank, you would expect the second to be the one to be shown, but if the choosen happens to be the first one, it will miss its goal).

What I believe could be done here, is not to merge the search API with tanks, and instead just have them as separate functionalities. In tanks there is already an EP to get the tanks that a file is on, so maybe that could be used to add a button in the context menu of the search index that can be used to call it, and in a second view(Or a modal) just show a list of them(Whether with one thumb or a list of names), and then you can click it and open it. Something like this:

Captura de pantalla 2023-07-27 163215

Or in a different approach, an intermediate view between clicking the thumb in the index view and the reader, like the one in Ichaival, could be implemented, and in that view you could put at the bottom a section(Maybe even a carrousel) called "Works featured in:" and then provide the list of Tanks that returns the EP.

I'm not sure, I just think that merging the search API with tank API has the potential to break something very easily, so I'm leaving those 2 ideas just in case.

Difegue commented 1 year ago

My thoughts were that for use cases like pending to read/favorites, you would use categories instead. (which should be expanded to contain tank IDs as mentioned)

If you have the same id in 100 tanks and search for it you would indeed get 100 entries, but that'd be working as intended for me. We could block IDs from being in more than one tank, but I don't think this is a common enough scenario to warrant bothering with. (also for people who'd want to sort stuff like weekly/monthly kairakutens as mentioned above, ID duplication is something they might actually want)

Difegue commented 1 year ago

Tanks don't really have much reason to exist currently if all they do is "categories but ordered" in my opinion, the goal is more to declutter large archive indexes by allowing users to view stuff like manga series with multiple chapters as one item.

Mixing them in search results also helps third-party clients (and the webreader) with their reading implementation as there's way less API to implement - I believe they'd only have to implement the call to get the IDs of a tank to get a working reader. (or i could implement a shim to the extract api to handle tanks 100% transparently, we'll see ig)

The current tank management endpoints will remain as they currently are if they want to add support for those later down the line, ofc.

CirnoT commented 1 year ago

(also for people who'd want to sort stuff like weekly/monthly kairakutens as mentioned above, ID duplication is something they might actually want)

^ Exactly this

the goal is more to declutter large archive indexes by allowing users to view stuff like manga series with multiple chapters as one item.

I'd consider adding option to disable tank grouping in search, in case someone wants to quickly search for very specific entry. It would also be very useful when searching by source namespace which includes exact link for E-H.

For example use case, my little script that checks whether E-H archive is added to LRR would work better if it would be able to request search without tank grouping, as then it would be able to link directly to specific entry in LRR instead of navigating to first page of tank.

AbyssalMonkey commented 1 year ago

(also for people who'd want to sort stuff like weekly/monthly kairakutens as mentioned above, ID duplication is something they might actually want)

^ Exactly this

the goal is more to declutter large archive indexes by allowing users to view stuff like manga series with multiple chapters as one item.

I'd consider adding option to disable tank grouping in search, in case someone wants to quickly search for very specific entry. It would also be very useful when searching by source namespace which includes exact link for E-H.

For example use case, my little script that checks whether E-H archive is added to LRR would work better if it would be able to request search without tank grouping, as then it would be able to link directly to specific entry in LRR instead of navigating to first page of tank.

This all, exactly. I want it to organize all the little archives I have into books, and then I can shove those books into categories. Ordered categories might be useful at that point because I could then order those books in the category by release for things like "kairakuten", "dascomi", "beast", etc. But hey, I could probably make meta-meta archives for this too if the system is robust enough.

If it doesn't have ID duplication, the feature is a bust because then I need to duplicate my archives anyway; once for the serial release (anthology), and once for the artist compilation. I want it in two places because it goes in two places. This isn't a library, its a database, and we should be treating it as such.

EfronC commented 1 year ago

I'd consider adding option to disable tank grouping in search, in case someone wants to quickly search for very specific entry. It would also be very useful when searching by source namespace which includes exact link for E-H.

For example use case, my little script that checks whether E-H archive is added to LRR would work better if it would be able to request search without tank grouping, as then it would be able to link directly to specific entry in LRR instead of navigating to first page of tank.

Yes please, actually, I think this would be the best intermediate, in case you're someone that likes to create groups for everything, an option to just have the search API not to replace for the tank, but to show the individual files.

Having to search for an specific archive, but then not being able to find it easily because it comes inside another was another problem I was thinking, just didn't mentioned it, but if the option to group the archives in Tanks was a toggleable in the search Index, I think it would be the best option to meet both sides.

Difegue commented 1 year ago

There is a case to be made for being able to quickly fish a specific ID by a source tag or similar, I hadn't necessarily thought about that. (although for source: in particular I believe you could use the bundled source finder plugin for that, it'd fish directly in URLMAP which wouldn't be affected)

Being able to disable tank grouping in searches is possible, but that'd overcomplexify the feature a bit if I stick with the approach of integrating tanks directly into the search/tag database indexes.. I'll have to give this a bit more thought.

CirnoT commented 1 year ago

Being able to disable tank grouping in searches is possible, but that'd overcomplexify the feature a bit if I stick with the approach of integrating tanks directly into the search/tag database indexes.. I'll have to give this a bit more thought.

Unrelated, but I think LRR is again being very much limited by choice of NoSQL like Redis here. We'd really benefit from switchover to SQLite for keeping metadata about archive relationship! Of course this can also easily be done in Redis with JSON blob representing relationship status of each archive but it makes for a very complex search logic server-side and is slow when needing to update any archive requires rebuilding entire blob. With a simple SQL schema we could easily do very fast and efficient queries including or excluding archives that are in tanks as well as JOINing to actual tanks, which could be just a view table to simplify queries for filling in tags.

(Also entire search logic could easily be using FTS5 extension greatly reducing server-side complexity at the cost of having to update virtual table in addition to normal one, but we already do this for search cache with Redis and this would be even simpler, so overall it would be a win-win in my opinion)

Being able to disable tank grouping in searches is possible, but that'd overcomplexify the feature a bit if I stick with the approach of integrating tanks directly into the search/tag database indexes.. I'll have to give this a bit more thought.

Similarly to others that commented here, not having this functionality would make me avoid using tanks entirely, as I often rely on being able to directly navigate to specific archive. Of course, I would also love to be able to segregate them into a virtual tank group when necessary, but I consider being able to quickly find specific thing more important than cleaner search results/tank organization.

EfronC commented 1 year ago

Been thinking about this topic yesterday, and I think I've a proposal that could help.

So, what I'm understanding you want to do in the Search API is that you get a page of results, and then apply a foreach to the list with the function get_tankoubons_by_file() to get the tanks, which you would use to replace those results in what you return to the Front.

If this is correct, here is what I propose: I can implement a feature more in the tanks, and reserve the score -1 to save a variable 'R' or 'NR', where R stands for replace(Default will be NR), and then I could create a new function get_replace_tankoubons_by_file(), which will work equally as get_tankoubons_by_file(), but instead will also check this new variable, and filter out those with NR. This way you would not need to add a feature to toggle off for the replacing feature in the search index, the Tank method will filter out the tanks not replaceable for you, and it will be up to the user to manage which tanks are marked to be used to hide the files from the results, and for you it will be transparent because you would only need to use the new function as if it was the first one, you would get only the tanks that the user expected to be used to alter the search index. Aditionally, I could also make this function work so it returns only the first result it finds, so this way 2 of the things mentioned before could be tackled: If one file is in many tanks, it will be replaced only for 1 in the results, and it would make this method much faster than having to search everything to make use of only one.

If this sounds correct, let me know and I will make the change and to create a PR with it(I could also take this chance to remove the demo view that I see got merged, which probably should have been removed before since is not a feature).

Difegue commented 1 year ago

My idea was more to insert tank IDs directly in the indexes that are used by search, replacing the IDs. See this todo:
https://github.com/Difegue/LANraragi/blob/9767f4f08ddad266f8afb7fba6720104d9600796/lib/LANraragi/Model/Stats.pm#L78

Doing a foreach post-search would work too and I guess it'd make the grouping easier to toggle, but performance would kinda suck 🤔

Leaving it up to the user isn't a bad idea, although I wonder if we really need to make it a per-tank preference: How about just having a new toggle in server settings?
image
This is more viable IMO since it means we can just regen the search indexes when that option is changed.

(I've cleaned up the demo views from the repo btw, thanks for reminding me)

EfronC commented 1 year ago

Yeah, it's a little different than what I thought, but I think my proposal might still work here, only that this time I could make a function that returns all the tanks with 'R'(Or you could just directly check the -1 score) so this way you only remove the ones from tanks with 'R', basically just adding an aditional if to your original idea.

Adding the toggle in the server settings works too, but just as an annotation, this might cause for people to always have it off, and probably just for one tank that you don't want to have this behavior(For example, you have 99 tanks you want to use to make a reduced UI, and only one for "Favorites", but because of that 1, that you don't want hiding stuff in the search index, you keep it off). I mean, doesn't sound too problematic, but I guess it would be a little sad that you work on a feature that no one uses due to a tiny detail.

CirnoT commented 1 year ago

I feel like this won't really solve anything. Ideally I'd want to be able to select whether I want to see tanks in search or not when performing actual search query, not per-tank or server-wide setting. Making it server-wide will mean I'll just leave it always in OFF position, while having ot per-tank means I'll always mark all of them as OFF.

Also, please consider that switching this setting as you presented in screenshot will cause search index to be rebuilt, yet it is under Global Settings category while all other search index related options are under Archive Files category. It cannot be underestimated how complex, unintuitive and inflexible LRR search index is and how often I have to manually perform Clean Database or Reset Search Cache after removing just a few archives and replacing them with a newer versions, otherwise LRR will always show Showing X or Y (out of Z). Maybe 5k archives is too much for it, after all Shinobu takes entire minute already to perform initial scan (on a local RAID5 volume backed by NAND cache, mind you!) and I'd hate to think how poorly it will scale with 10k.

Difegue commented 1 year ago

I think it's better to keep this simple for now, there's a bunch of other stuff left to do to get this into people's hands already. 🤔

We can explore dynamic grouping in search in a later release and remove the server setting at that point, imo. (Agree on the setting being in the wrong section, I kinda botched my mockup there.)

I still think that for stuff like favorites users can just use categories, they're more suited for that than tanks.

CirnoT commented 1 year ago

I still think that for stuff like favorites users can just use categories, they're more suited for that than tanks.

My use-case is distinct from favorites, it is to be able to lookup whether archive with specific E-H link exists and link to it directly, which would be broken by merge of individual archives into tanks, so I'd have to simply not use that feature.

Difegue commented 1 month ago

More progress on this a year later - I've incorporated tanks into search results as planned, but did make it optional following the discussion above. (which I've minimized to keep this issue readable..)

You can add group_tanks=false to a search API query to disable tank grouping. The setting defaults to true if not present, ofc.
We're now at "bare minimum support for third-party clients" stage, there's no real web UI or Reader support for all this yet.