Open bpepple opened 9 months ago
I've gone back through and I believed we'd left it at https://github.com/bpepple/metron/discussions/164#discussioncomment-5917592
I think in short term bandwidth won't be an issue, but it's worth giving some thought for future growth.
Apologies if I misunderstood or didn't catch something afterwards.
I did a dirty PR for using the hash which was rejected as larger work was required: https://github.com/comictagger/comictagger/pull/491 As is the way, a shiny new thing took me away so I do need to get back to it.
How much investigation did you do because I'm interested if you've also gotten a lot more new accounts (as one aim was a wider use of not Comic Vine) or if there are some "power" users responsible for the vast majority of the usage?
I'm surprised you don't have a Patreon or similar for Metron btw?
If you want the cover matching switched off just say the word, it will obviously require a new release so will depend on people updating.
I did try to get GCD to add cover hash as well so maybe another bump that too. https://github.com/GrandComicsDatabase/gcd-django/discussions/615
Apologies if I misunderstood or didn't catch something afterwards.
No worries, we probably just got our wires crossed.
I did a dirty PR for using the hash which was rejected as larger work was required: comictagger/comictagger#491 As is the way, a shiny new thing took me away so I do need to get back to it.
How much investigation did you do because I'm interested if you've also gotten a lot more new accounts (as one aim was a wider use of not Comic Vine) or if there are some "power" users responsible for the vast majority of the usage?
I'm planning to parse the server logs to get a breakdown on the different clients that access the server and get users counts, since I figured you'd be interest in seeing just how many users are using your plugin, though it is a pretty low-priority task so I'm not sure when that will happen.
On average we get 2-3 new accounts per day and maybe 1 of them will be using comictagger. Fairly often it seems when they initially start using it, they will re-tag their whole collection, which depending on the size can take several days, and then settle in and just tag new issues each week.
I'm surprised you don't have a Patreon or similar for Metron btw?
Setting up Open Collective for future funding is on the roadmap for 2024, but I think I need to get a 100 stars on the repo before qualifying so I'll need to encourage folks to do that since most of the time they will never visit the GitHub page.
If you want the cover matching switched off just say the word, it will obviously require a new release so will depend on people updating.
I think it would be worthwhile to do that since out of all the clients that access the API, comictagger is by far the most resource hungry.
On average, CT make 2-5 times the number of requests (depending of the series) to identify an issue compared to other clients, and CT is the only one that is downloading an image for every issue.
I'd suggest using the cover_hash
info from the API, I'm using it with metron-tagger and have had very good results. For example attempting to tag 3 issues without much information in the filename:
bpepple@frodo ~/Storage/test $ metron-tagger -or .
Starting online search and tagging:
----------------------------------
Using 'Batman #1 (2011)' metadata for 'Batman #001 (1).cbz'.
Using 'Batman #1 (2016)' metadata for 'Batman #001 (2).cbz'.
Using 'Batman #1 (1940)' metadata for 'Batman #001.cbz'.
Successful matches:
------------------
Batman #001 (1).cbz
Batman #001 (2).cbz
Batman #001.cbz
Starting comic archive renaming:
-------------------------------
renamed 'Batman #001 (1).cbz' -> 'Batman v2 #001 (2011).cbz'
renamed 'Batman #001 (2).cbz' -> 'Batman v3 #001 (2016).cbz'
renamed 'Batman #001.cbz' -> 'Batman v1 #001 (1940).cbz'
I'm planning to parse the server logs to get a breakdown on the different clients that access the server and get users counts, since I figured you'd be interest in seeing just how many users are using your plugin, though it is a pretty low-priority task so I'm not sure when that will happen.
No worries, yeah just a general interest in the numbers. Driving people to Metron to help support it is one of the goals so, it's nice to know if it's working.
On average we get 2-3 new accounts per day and maybe 1 of them will be using comictagger. Fairly often it seems when they initially start using it, they will re-tag their whole collection, which depending on the size can take several days, and then settle in and just tag new issues each week.
Hmm... yeah, re-tagging everything, that's a bit of a pain. I guess they like your data better :)
I'm surprised you don't have a Patreon or similar for Metron btw?
Setting up Open Collective for future funding is on the roadmap for 2024, but I think I need to get a 100 stars on the repo before qualifying so I'll need to encourage folks to do that since most of the time they will never visit the GitHub page.
Completely unsolicited suggestion: The reason I mention Patreon is because most people who are supporters of things will already have an account and removing any friction might help. Adding a goal as well, say that if Metron gets x dollars for x months, the limit rate will be reduced and the like. It looks like Open Collective is for normal users as well though? Going for the users of the data will be a wider audience for backers. Most devs will probably not be making commercial apps.
Also, do let me know when it goes active and I'm happy to change the about message to include the link to the page etc.
If you want the cover matching switched off just say the word, it will obviously require a new release so will depend on people updating.
I think it would be worthwhile to do that since out of all the clients that access the API, comictagger is by far the most resource hungry.
Sure thing. Are you okay with loading the images in the GUI still as that requires people clicking and should be a much lower volume? On average, CT make 2-5 times the number of requests (depending of the series) to identify an issue compared to other clients, and CT is the only one that is downloading an image for every issue.
The requests are tried to be kept to a minimum so I wonder what the difference is on the number of calls? (Rhetorical question but I guess metrontagger is one? If you want to throw some app names my way, most welcome.) I'd suggest using the
cover_hash
info from the API, I'm using it with metron-tagger and have had very good results.That is the plan, use the hash and then fallback to downloading the image. Some larger structural changes are needed but now it's more forefront because of this :)
Completely unsolicited suggestion: The reason I mention Patreon is because most people who are supporters of things will already have an account and removing any friction might help. Adding a goal as well, say that if Metron gets x dollars for x months, the limit rate will be reduced and the like. It looks like Open Collective is for normal users as well though? Going for the users of the data will be a wider audience for backers. Most devs will probably not be making commercial apps.
In my experience, most open source projects tend to use Open Collective (Komga, Kavita, Solus, Discord.py, journa.host, etc) over Patreon, since it has some benefits that Patreon doesn't offer like fiscal hosting.
Could offer both as options, but I'm not sure it would be worth the extra work involved.
Sure thing. Are you okay with loading the images in the GUI still as that requires people clicking and should be a much lower volume?
Not sure, since I've never used the GUI. When I get some free time I'll give it a look, so I understand how it's implemented.
The requests are tried to be kept to a minimum so I wonder what the difference is on the number of calls? (Rhetorical question but I guess metrontagger is one? If you want to throw some app names my way, most welcome.)
Let's say you are searching a file, like Batman #001.cbz
, Metron-Tagger makes the following requests to the API:
bpepple [09/Feb/2024:08:26:27 -0500] "GET /api/issue/?series_name=Batman&number=1 HTTP/1.0" 200 34279 "-" "Metron-Tagger/2.0.1 Mokkari/3.0.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:26:28 -0500] "GET /api/issue/?number=1&page=2&series_name=Batman HTTP/1.0" 200 34361 "-" "Metron-Tagger/2.0.1 Mokkari/3.0.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:26:29 -0500] "GET /api/issue/?number=1&page=3&series_name=Batman HTTP/1.0" 200 8213 "-" "Metron-Tagger/2.0.1 Mokkari/3.0.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:26:31 -0500] "GET /api/issue/34067/ HTTP/1.0" 200 2682 "-" "Metron-Tagger/2.0.1 Mokkari/3.0.0 (Linux; 6.7.3-200.fc39.x86_64)"
Using the same file with the ComicTagger CLI, the following requests are made to the API:
bpepple [09/Feb/2024:08:32:28 -0500] "GET /api/series/?name=Batman HTTP/1.0" 200 13871 "-" "comictagger/1.6.0a11.dev8 Mokkari/2.6.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:32:29 -0500] "GET /api/series/?name=Batman&page=2 HTTP/1.0" 200 13882 "-" "comictagger/1.6.0a11.dev8 Mokkari/2.6.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:32:30 -0500] "GET /api/series/?name=Batman&page=3 HTTP/1.0" 200 3674 "-" "comictagger/1.6.0a11.dev8 Mokkari/2.6.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:32:31 -0500] "GET /api/issue/?series_id=2481&number=1 HTTP/1.0" 200 338 "-" "comictagger/1.6.0a11.dev8 Mokkari/2.6.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:32:33 -0500] "GET /api/series/2481/ HTTP/1.0" 200 513 "-" "comictagger/1.6.0a11.dev8 Mokkari/2.6.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:32:35 -0500] "GET /api/issue/?series_id=763&number=1 HTTP/1.0" 200 340 "-" "comictagger/1.6.0a11.dev8 Mokkari/2.6.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:32:37 -0500] "GET /api/series/763/ HTTP/1.0" 200 425 "-" "comictagger/1.6.0a11.dev8 Mokkari/2.6.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:32:40 -0500] "GET /api/issue/?series_id=93&number=1 HTTP/1.0" 200 336 "-" "comictagger/1.6.0a11.dev8 Mokkari/2.6.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:32:42 -0500] "GET /api/issue/?series_id=93&number=1 HTTP/1.0" 200 336 "-" "comictagger/1.6.0a11.dev8 Mokkari/2.6.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:32:43 -0500] "GET /api/series/93/ HTTP/1.0" 200 555 "-" "comictagger/1.6.0a11.dev8 Mokkari/2.6.0 (Linux; 6.7.3-200.fc39.x86_64)"
bpepple [09/Feb/2024:08:32:46 -0500] "GET /api/issue/34067/ HTTP/1.0" 200 2682 "-" "comictagger/1.6.0a11.dev8 Mokkari/2.6.0 (Linux; 6.7.3-200.fc39.x86_64)"
The primary reason CT makes almost 3x the number of API requests is because it's using the Series
endpoints and then using Issue
endpoint, instead of just using only the Issue
endpoint with the series_name
parameter.
In my experience, most open source projects tend to use Open Collective (Komga, Kavita, Solus, Discord.py, journa.host, etc) over Patreon, since it has some benefits that Patreon doesn't offer like fiscal hosting.
Could offer both as options, but I'm not sure it would be worth the extra work involved.
I would like, if at all possible to put a message along the lines of: "The image matching feature has been removed due to the large amount of data it consumes on the Metron servers, if you are able to help with the server costs please consider donating here (link)."
Sure thing. Are you okay with loading the images in the GUI still as that requires people clicking and should be a much lower volume?
Not sure, since I've never used the GUI. When I get some free time I'll give it a look, so I understand how it's implemented.
It shows a list of issues and when a user click on a different issue it loads the data (and fetches the images).
Let's say you are searching a file, like
Batman #001.cbz
, Metron-Tagger makes the following requests to the API:
Thanks for that. I will look into making changes to do the same.
I would like, if at all possible to put a message along the lines of: "The image matching feature has been removed due to the large amount of data it consumes on the Metron servers, if you are able to help with the server costs please consider donating here (link)."
I don't see any problem with that.
It shows a list of issues and when a user click on a different issue it loads the data (and fetches the images).
Did some testing today of the GUI. One thing I noticed, on the AutoTag window it appears to load the cover from the user's comic on the left side of the window and on the right side it loads an image downloaded from Metron, which seems to be a waste of time/space since it's immediately clears it when it moves on to the next issue to be identified.
Another thing I noticed is that ComicTagger was only able to identify about 75% of the issues that Metron-Tagger was able to, I'm not sure why there was such a discrepancy (and I don't really feel like digging thru the various logs to find out), but here's a list of issues that failed if you're interested.
Results of testing 149 issues:
Metron-Tagger
ComicTagger using AutoTagging
By the way, where is the screenshot you added from? The 30 minutes or so I spent testing CT, I never came across it.
Oh, and it might be worthwhile to convert this to a discussion.
I would like, if at all possible to put a message along the lines of: "The image matching feature has been removed due to the large amount of data it consumes on the Metron servers, if you are able to help with the server costs please consider donating here (link)."
I don't see any problem with that.
Only there is no link to send them to :) Maybe I can say something along the lines of "Donations accepted soon, check the website."?
Did some testing today of the GUI. One thing I noticed, on the AutoTag window it appears to load the cover from the user's comic on the left side of the window and on the right side it loads an image downloaded from Metron, which seems to be a waste of time/space since it's immediately clears it when it moves on to the next issue to be identified.
As it has to download the image to generate the hash anyway, I think it's just a way to see that something is happening, iirc that same window is used for low confidence matches too for cover comparisons.
Another thing I noticed is that ComicTagger was only able to identify about 75% of the issues that Metron-Tagger was able to, I'm not sure why there was such a discrepancy (and I don't really feel like digging thru the various logs to find out), but here's a list of issues that failed if you're interested.
Results of testing 149 issues:
* Metron-Tagger * 149 / 149 issues identified * 11m36.771s elapsed * ComicTagger using AutoTagging * 112 / 149 issues identified (I didn't save on low confidence) * 22m17.088s elapsed
Thanks for the info, something to look into. It may be related to different covers etc.
By the way, where is the screenshot you added from? The 30 minutes or so I spent testing CT, I never came across it.
That's via
Search Online
, it's the manual way of tagging an issue.Oh, and it might be worthwhile to convert this to a discussion.
Don't have them on atm.
Only there is no link to send them to :) Maybe I can say something along the lines of "Donations accepted soon, check the website."?
That would be fine, tho I don't have an ETA when I'll have the donation option available.
Thanks for the info, something to look into. It may be related to different covers etc.
Did a quick glance at the files that couldn't be matched and my guess is
metron_talker
isn't sanitizing the series name, in particular the ' - ' (make note of the spaces surrounding the hyphen) in the user's comics file names, which is commonly used to replace the ':' on Windows filesystems. My guess is stripping that from the series name should greatly increase your matches.
Btw, if you need to know what Metron is receiving for your requests while working on this, just give me a shout and I can look at the server logs, tho it might be best to contact me on Matrix (if you use it) for a quick response.
That would be fine, tho I don't have an ETA when I'll have the donation option available.
I have a PR ready to go #18 and I've changed the about text to: [Metron](https://metron.cloud/) is a community-based site whose goal is to build an open database with a REST API for comic books. NOTE: An account on [Metron](https://metron.cloud/) is required to use its API. NOTE: Automatic image comparisons are not available due to the extra bandwidth require. Donations will be accepted soon, check the website.
If that is okay by you?
Did a quick glance at the files that couldn't be matched and my guess is
metron_talker
isn't sanitizing the series name, in particular the ' - ' (make note of the spaces surrounding the hyphen) in the user's comics file names, which is commonly used to replace the ':' on Windows filesystems. My guess is stripping that from the series name should greatly increase your matches.
Thanks for having a look. Sanitation can be a messy business (sorry, couldn't resist), if it helps or hinders.
Btw, if you need to know what Metron is receiving for your requests while working on this, just give me a shout and I can look at the server logs, tho it might be best to contact me on Matrix (if you use it) for a quick response.
I think the problem is the IssueIdentifer
first searches for a series then for issues (with the correct issue number (and year)) in the series that it has found. The IssueIdentifier
is the thing that needs a rewrite (so it can use the hashes too) so I'm going to look at other APIs to see if they can search for series and issues in the one query too. That might be an extra path that can be added to the new IssueIdentifier
.
Were you okay with the covers being used in the issue list? (The user has to manually click on each issue after the first.)
I have a PR ready to go #18 and I've changed the about text to:
[Metron](https://metron.cloud/) is a community-based site whose goal is to build an open database with a REST API for comic books. NOTE: An account on [Metron](https://metron.cloud/) is required to use its API. NOTE: Automatic image comparisons are not available due to the extra bandwidth require. Donations will be accepted soon, check the website.
If that is okay by you?That looks fine to me.
Were you okay with the covers being used in the issue list? (The user has to manually click on each issue after the first.)
Haven't had a chance to look at it yet, but I guess my questions would be:
Haven't had a chance to look at it yet, but I guess my questions would be:
1. How often is this window used? Is it used more often compared to the other search option?
It's a manual process so I would image people would only use it now and then but no metrics are sent from CT to know for sure. Unless someone has impressive action per minute, I would imagine it's low.
2. What is the reason for using a downloaded cover compared to just using the cover from the user's comic? Is there a particularly compelling reason?
This would be for comparison reason I would guess. CT was created a long time ago so I can't say for sure. It was also probably to show the same information as the website. It can always be taken out at a later date if it's use proves to be a problem. What do other people (if they do) use it for? I know I asked for it to be included but maybe remove the link from the API until the donation system is in place and covers the expenses?
This would be for comparison reason I would guess. CT was created a long time ago so I can't say for sure. It was also probably to show the same information as the website. It can always be taken out at a later date if it's use proves to be a problem. What do other people (if they do) use it for? I know I asked for it to be included but maybe remove the link from the API until the donation system is in place and covers the expenses?
As far as I'm aware you're the only in the /api/issue
endpoint using the image url, most users if they need it get it from the /api/issue/id/
endpoint. Anyway, removing it could cause all kinds of headaches, since it would be an API breaking change, so I'm not inclined to doing that.
I only asked why it was needed, since in most of the projects I've been involved with, we tended to write fairly detailed user stories on the work flow, information displayed, etc. needed before implementing it, and I still tend to think that way when looking at UI's.
Anyway, if it's low impact window I don't see an issue, but I guess I'd say give it look to see if it's really needed or not.
BTW, one other thing I thought of was it would make sense to change the user-agent on the plugin to something like Metron-Talker/{version}
instead of comictagger/{version}
, since that would make it easier to identify the users that don't upgrade the plugin. Using the comictagger version doesn't tell us what version of metron_talker they're using.
As far as I'm aware you're the only in the
/api/issue
endpoint using the image url, most users if they need it get it from the/api/issue/id/
endpoint. Anyway, removing it could cause all kinds of headaches, since it would be an API breaking change, so I'm not inclined to doing that.Empty string is an option.
I only asked why it was needed, since in most of the projects I've been involved with, we tended to write fairly detailed user stories on the work flow, information displayed, etc. needed before implementing it, and I still tend to think that way when looking at UI's.
I don't know but I pretty confident that didn't happen :)
Anyway, if it's low impact window I don't see an issue, but I guess I'd say give it look to see if it's really needed or not.
Visuals are always appealing. I do wonder if it should also show the local cover too...
BTW, one other thing I thought of was it would make sense to change the user-agent on the plugin to something like
Metron-Talker/{version}
instead ofcomictagger/{version}
, since that would make it easier to identify the users that don't upgrade the plugin. Using the comictagger version doesn't tell us what version of metron_talker they're using.
Talkers don't have version numbers because they were built with CT (that might need to change now plugins can be loaded from a dir). There will need to be a new version of CT for Metron talker to be replaced (1.6.0a11 as it stands).
2. What is the reason for using a downloaded cover compared to just using the cover from the user's comic? Is there a particularly compelling reason?
That manual issue identification window is showing search results. The user can use the downloaded image to verify (through manual visual inspection) that the result matches the local comic's cover. If it's the wrong series (or volume) the user can back out and select another series.
Also, back in the day it was not uncommon to have issue numbers in the filenames be wrong, so easily searching through issues in the series for the correct issue (and cover) was handy.
The original UI for CT was modeled on the Comic Vine Scraper add-on for ComicRack and predates the automatic visual hash matching.
The original UI for CT was modeled on the Comic Vine Scraper add-on for ComicRack and predates the automatic visual hash matching.
I guess that why I'm asking if it's necessary, showing the image while also checking the hash, seem a bit like using a belt while wearing suspenders.
Truthfully, I think it would be worthwhile to look at the GUI as a whole, since there seems like a lot of ways to improve it, performace-wise and also on the usability-front.
Talkers don't have version numbers because they were built with CT (that might need to change now plugins can be loaded from a dir). There will need to be a new version of CT for Metron talker to be replaced (1.6.0a11 as it stands).
Well, you're publishing to PyPi, so I mean they do have a version number, you're just not using it in the build tools. Truthfully, a version number for the plugin would be way more useful the knowing the CT version on my end.
I guess that why I'm asking if it's necessary, showing the image while also checking the hash, seem a bit like using a belt while wearing suspenders.
It's not needed while checking the hash, but that's a different window that runs during the GUI auto-tagging processes. That was definitely more a "watch the computer at work" bell-and-whistle, where the local cover is on the left, and the potential matches are cycled through on right as the auto-tagging is a work.
Since CV allowed requests for thumbnails, it was relatively cheap to use. But yes, if it's a burden on the provider, the right side cover display should be disabled in that specific window.
But the auto-tagging process, while very effective, is not bullet-proof, and can fail to match an issue for any number of reasons (poorly named file, a cover image that doesn't show up in the right order in the archive, etc). When the automatic process fails, the manual process become necessary. That involves one the windows like @mizaki shared above. In this case, there is no hash matching going on, just human eyeballs at work. I'd guess that the cover downloads for the manual issue identification window are not much more frequent than if someone was trying to browse to the website to verify that that a cover is correct.
So, yes, belt-and-suspenders in general for the application, as ComicTagger was intended to be a tool for doing both automatic and manual tagging.
When the automatic process fails, the manual process become necessary. That involves one the windows like @mizaki shared above. In this case, there is no hash matching going on, just human eyeballs at work. I'd guess that the cover downloads for the manual issue identification window are not much more frequent than if someone was trying to browse to the website to verify that that a cover is correct.
In that scenario, I can definitely see downloading the cover since I'm guessing it's a fairly rare occurrence.
Also, it might be worthwhile sometime down the line, that Metron offer thumbnails images.
I received January's bill for Metron, and noticed the site's bandwidth usage has jumped fairly significantly which coincides with the increased usage of ComicTagger.
Did some investigating of the server logs and then tested CT locally and noticed you are downloading the cover to assist in matching, I thought we discussed not enabling that since unlike CV the server expenses are currently being covered by me and not a corporation. That was part of the reason why the Cover Hash info was added to Metron API to avoid unnecessary bandwidth usage for something that really shouldn't be needed.
I'm fine covering any addition cost for awhile, but if you want to continue using that feature we should probably talk about how to offset the additional cost.