Output debug files (suggestion)

chalkos commented 3 years ago

It could be useful to have some information about duplicates so users can try to figure out why the amount of charms in the exported list and ingame differ. In my case I have now 533 exported with version 1.2 and 537 ingame. Do I have duplicate charms ingame or are some charms not being recognized?

Duplicates could be exported to a different file, maybe even with more information such as:

Water Resistance,1,Master Mounter,1,2,1,0 found in frames\frame540.png, frames\frame560.png
Windproof,2,Defense Boost,1,3,2,0 found in frames\frame540.png, frames\frame560.png

Additionally, there could also be a list of frames that weren't recognized as an actual charm, so the user can look at the images and submit a new issue if the charm from the image is not in charms.encoded.txt

chpoit commented 3 years ago

I definitely agree with your point, knowing what charms are duplicates would be a great feature to have.

You do run into some issues when you have charms popping up multiple times, either caused by going over a charm multiple times in a non-sequential order (for example when you wrap around tot the first charm on a row), or when they are actual duplicates. On my end, I extract 453 "unique" frames but only 352 charms, because I was really "aggressive" in how I went through charms, trying to cram as many charms as possible per clip. I would have over 100 "duplicates". From #7 I saw you had a much more "reasonable" amount of charms per clip, which leads me to believe you didn't "double-scan" charms, or had very few of them. In cases like yours it would actually make sense to show duplicates. In your case where you have over 500 charms, I think it's far from impossible for you to have four true duplicates.

Currently, the way "unique" frames are found is only done by comparing to the previous "unique" frame. I see two possible ways to fix this:

Find "where" a charm is, meaning row, column and page, and keep only one frame per position
- Genuinely seems like a pain to implement in a non hacky/harcoded manner
Compare the initial set of unique frames to find the true unique frames
- Might be doable easily by hashing the frames in a "fuzzy" enough way
- Would have to be done in a fuzzy way since the "breathing" effect on the "selection yellow-gold" could change a frame enough for it to be "different"
- Thresholding/grayscaling might fix the "breathing" issues

I need to implement either of these optimisations before I look at logging duplicates. On the plus side, doing it would cut the time spent identifying charms with Tesseract

Additionally, there could also be a list of frames that weren't recognized as an actual charm, so the user can look at the images and submit a new issue if the charm from the image is not in charms.encoded.txt

I'm not sure I get that part, do you mean cases where Tesseract/something errors out, or the 800-850 frames per clip that are not "unique"?

In any case, could you upload your clips either to this issues/any file hosting site? I could use more data to debug parts of the code.

chalkos commented 3 years ago

Comparing the frames does indeed seem complex because of the "breathing" effect. But what if when a charm is detected, you just store the frame that originated it?

I was thinking just building a map like this while processing the frames:

{
    "Water Resistance,1,Master Mounter,1,2,1,0" => [ "frames\frame540.png", "frames\frame560.png" ],
    "Windproof,2,Defense Boost,1,3,2,0" => [ "frames\frame5480.png", "frames\frame7210.png" ]
}

In a first-approach, it would be up to the user to check if 2 frames that have the same charm are the same or not, but then again the only real reason to check for duplicates is to debug or to sell the duplicates.

After that maybe try to have the comparison be done with Tesseract. Hardcoded does sound like a pain... to check if certain pixels are yellow enough or something. I guess it needs to be a balance of fuzzy/threshold/color correction in order to compare frames like that.

You mentioned grayscale, so I was trying to see if there was some effect that could potentially help differentiate the frames.. I loaded up one frame in https://pixlr.com/x/, selected Effect -> too old -> agnes, tunned it all the way to 100 and it turned this:

frame440

into this

frame440_tooold

I don't know much about image processing but it looks easier to work with, since it removes most of the backgroud and it seems that the "breathing" effect becomes minimal. You would still need to adapt "too old -> agnes (100)" to your code and test, but it looks like a usable start.

Additionally, there could also be a list of frames that weren't recognized as an actual charm, so the user can look at the images and submit a new issue if the charm from the image is not in charms.encoded.txt

I'm not sure I get that part, do you mean cases where Tesseract/something errors out, or the 800-850 frames per clip that are not "unique"?

In any case, could you upload your clips either to this issues/any file hosting site? I could use more data to debug parts of the code.

I meant the not "unique" frames, but it makes no sense now that I understand a bit more how the processing works. I've never had a case of the tool asking me to fill in what a specific charm is (that you describe in the readme) so maybe I was thinking of that case which is already being handled.

In any case, could you upload your clips either to this issues/any file hosting site? I could use more data to debug parts of the code.

Sure thing. These are the last I did. I tried to make them as fail-safe as possible to make it easier to detect every charm. I doesn't look like github did any compression on these, so they should be good to use as inputs.

https://user-images.githubusercontent.com/98429/118030506-4ec81d00-b35d-11eb-9baf-a46c704abcc4.mp4

https://user-images.githubusercontent.com/98429/118030550-5e476600-b35d-11eb-9db7-47cafa1358ff.mp4

https://user-images.githubusercontent.com/98429/118030585-6a332800-b35d-11eb-881e-1e900c6c2d02.mp4

https://user-images.githubusercontent.com/98429/118030853-b9795880-b35d-11eb-93de-9209b77984f9.mp4

https://user-images.githubusercontent.com/98429/118030881-c39b5700-b35d-11eb-8651-48621b086978.mp4

https://user-images.githubusercontent.com/98429/118030898-cbf39200-b35d-11eb-8373-7e0ee62c28ee.mp4

chpoit commented 3 years ago

Turns out hashing images is a pain :)

I ended up tweaking the values I use for tresholding and I was able to reduce the "unique" frames on my side from 453 to 363, and "only" keep 539 frames on your side.

I also added some "brute-force" comparison of the first batch of unique frames to achieve that reduction, and from my checking of both our framesets, no charm is lost.

The "problematic" cases that remain are similar to this one, where you have both extremes of the breathing effect, or cases where you change to the last page of charms and end up on a spot that has no charm. ` frame4441 frame4413

`

I found two cases with the breathing extremes in the frames, that would drop the 539 to the 537 charms you have, and would mean 4 "real" dupes. This is similar to my case, but I have 6 edgecases, dropping to 357 charms, and 5 dupes. I still need to actually validate that, but I think it's likely

Regarding the grayscale, I was already using grayscale images in most "processing" steps, and I mentionned it both as an explanation/a note to myself.

I'll look into what the "best" way to store the frame in the charm is, I might just throw it in the Charm class,Your map is giving me an idea on how to do that by playing with sets, but I'll see. I want to improve the run time of the Tesseract step before doing that.

chalkos commented 3 years ago

Turns out hashing images is a pain

Because they would need to be exactly the same image to produce the same hash, right? that sounds hard to achieve..

I ended up tweaking the values I use for tresholding and I was able to reduce the "unique" frames on my side from 453 to 363, and "only" keep 539 frames on your side.

That's really good. It is pretty manageable to manually compare 10 pairs of frames, I wouldn't mind that..

I'll look into what the "best" way to store the frame in the charm is, I might just throw it in the Charm class,Your map is giving me an idea on how to do that by playing with sets, but I'll see. I want to improve the run time of the Tesseract step before doing that.

Sounds good, looking forward to the improvements. Being able to export my whole charm collection to the builder is awesome!

chpoit commented 3 years ago

I just pushed a new version here with this change along with a major speedup on the Tesseract step.

chpoit / utsushis-charm

Output debug files (suggestion) #9