any2cards / worldhaven

All of the JSON data files and Asset image files that the WAV utilizes
100 stars 25 forks source link

Ability card level #3

Closed Zachu closed 2 years ago

Zachu commented 2 years ago

Would ability card levels and perhaps even the card number fit into the scope of this project?

What I was thinking is that the https://github.com/any2cards/worldhaven/blob/master/data/character-ability-cards.js file would contain something like

   {
     "name": "frigid apparition",
     "points": 247,
     "expansion": "Gloomhaven",
     "image": "character-ability-cards/gloomhaven/MT/gh-frigid-apparition.png",
     "xws": "frigidapparition",
+    "data": {
+      "level": "X",
+      "number": 158
+    }
   },

I think these could be extracted with tesseract or something similar if not already available in some format somewhere else in machine readable format.

any2cards commented 2 years ago

It certainly could be added. I am simply curious as to how it would be used? In other words, the purpose of this repository is displaying card images. Are you using the repo for other purposes? If I would implement it, I would be tempted to do something like:

{ "name": "frigid apparition", "points": 247, "expansion": "Gloomhaven", "level": "X", "card number": 158, "image": "character-ability-cards/gloomhaven/MT/gh-frigid-apparition.png", "xws": "frigidapparition", },

The reason for this format would be to standardize it as something to be used (specifically card number) across all types of cards.

Zachu commented 2 years ago

It could definitely be something like that too. I thought the "data"-section would be a unified place to store data that's actually printed on the card. Basically the name would then fit into that section too. But this was just something I initially thought and not something I put lots of thought effort into.

My actual use case was to just simply "show me Mind Thief's level1 (and X) cards" so that I would not get spoiled on the rest of the abilities I would unlock later on.

But considering the README's first sentence:

An easy-to-use collection of data and images...

it doesn't actually state what this can and should be used for. And that's great! I think this project has an opportunity to lay out a foundation for a lots of projects apart for just showing you the images.

I immediately also thought that under data/character-mats.js there could be for example a number stating the hand size and max hp for each level of the character because that is stated in that entity. Therefore one could use this an authorative machine-readable data source for *Haven facts as well as place where the graphics can be found. But I guess this is quite an ambition =)

Zachu commented 2 years ago

Btw I would gladly do the work myself and send a pull request :) I can try something tomorrow now that I know you're open to the idea

JaMeZ-B commented 2 years ago

I would also like to have more data (in particular the level of character cards) in the JSON file. For ability cards the initiative would also be a sensible datum to add here.

Here is an outline of a project I would do, if this data was available: Just last weekend I wanted to create "character overviews" with all the ability cards of each character as a pdf (to annotate it with my thoughts on the cards using a tablet). My idea was to produce the pdf with LaTeX and write a small Python program to generate the .tex code based on the JSON data. For such an overview sorting by level is the only sensible ordering.

Btw I would gladly do the work myself and send a pull request :) I can try something tomorrow now that I know you're open to the idea

I could also help adding the data. BTW: the index.html has listings by level -- one could try to extract them. Another source of machine readable card data could also be Gloomhaven Digital (I haven't checked so far).

Zachu commented 2 years ago

For ability cards the initiative would also be a sensible datum to add here.

I completely agree and actually when doing some sort of a proof of concept for the data extraction from the images I already did try to scan the initiative value also.

I could also help adding the data. BTW: the index.html has listings by level -- one could try to extract them. Another source of machine readable card data could also be Gloomhaven Digital (I haven't checked so far).

Nice! I had not noticed that. I guess should still try the tesseract way if we want other values be spotted from the images, unless someone can find better source for the card data. One other source for, for example the initiative value, could be the TTS Gloomhaven Fantasy Setup mod which has "sort hand by initiative". But I don't know if that's open source either.

I'm currently trying to teach a tesseract model the fonts used in Gloomhaven because it wasn't that accurate with the default model. Levels could be extracted with good accuracy but the initiative value often failed. It took me a few whiles to learn how this tesseract thing works but let's see if I get something useful out of it =)

any2cards commented 2 years ago

I want to make something clear ... I am not opposed to making modifications to the JSON to provide more information about any of the cards that it currently supports, including the Character Ability Cards UNLESS such modifications would prevent the WAV from functioning properly based on its original purpose.

To that end, I will have to vet all pull requests, and do thorough testing BEFORE anything is rolled into production.

Zachu commented 2 years ago

I have currently existing pretty much working setup with only a few values that has to be put there manually because the OCR is unable to read them. I'll see a bit later when I can do a PR for this, and then refine the manual work values there as well.

My output currently looks something like this, which of course has to be combined into the data sources then:

{"image":"character-ability-cards/gloomhaven/TI/gh-tinkerers-tools.png","initiative":26,"level":"3","number":47}

If something fails to read correctly it says "null". The level field being string "null" currently, but anwyway =) It should be quite easy to go through.

{"image":"character-ability-cards/gloomhaven/EL/gh-ice-spikes.png","initiative":null,"level":"1","number":455}

What I noticed during this is that the "BS" class has different padding in it's cards so I have to do something about that as well =) And currently I've only tried the Gloomhaven ability cards.

Zachu commented 2 years ago

Still not pull-request worthy but here's an update at least. All the Gloomhaven -cards have been parsed now in my fork of this repository, Zachu/worldhaven. but I've not yet filled the ones that the OCR reader couldn't find the correct values. That's something I could use some help with but maybe a bit later when I've otherwise done fiddling with it by scripts. Anyways examples can be seen in here https://github.com/Zachu/worldhaven/blob/master/data/character-ability-cards.js#L4559-L4568.

I plan to read through the Crimson Scales, Forgotten Circles, and Jaws of the Lion cards through as well before considering the automation part being done. I intend to leave the Frosthaven ones out because they don't seem to be... a bit different. I just need to fix some positioning and stuff for the other games.

Few questions though, perhaps for @any2cards mostly but others can also comment on it.

  1. It seems that the tool/method I'm using to combine the output of the OCR script and the original data now sorts the data file by the image key. Meaning that the first item in the data file would be aa-back instead of be-back. Do you see this as an issue?
  2. What should the final keys for the initiative, level, and number be, and should we put them under some key telling us that the values are read from the actual image?

Here's my personal opinions on these but I consider @any2cards having the last call on these.

  1. For me the new order would be more intuitive but I don't know if something depends on it.
  2. I initially proposed data key but actually now thinking of it more I'd actually choose something like values where I'd put the initiative, level, and number. We could follow similar pattern of values in another data files as well if we end up reading data out of those as well.
any2cards commented 2 years ago

To be clear, there are reasons for the very specific order that data is within each of the JSON files (in terms of what shows first, second, etc.). I do not want to change that order. This is true for the order of individual entries, as well as for the subentries for each name entry.

Ideally, each new key would be entered just above the "xws" key which is the last key for each entry. All things being equal, each additional item you want information for would be its own key. So for example, the Brute's "Balanced Measure" could go from:

{ "name": "balanced measure", "points": 31, "expansion": "Gloomhaven", "image": "character-ability-cards/gloomhaven/BR/gh-balanced-measure.png", "xws": "balancedmeasure" },

To:

{ "name": "balanced measure", "points": 31, "expansion": "Gloomhaven", "image": "character-ability-cards/gloomhaven/BR/gh-balanced-measure.png", "initiative": "77", "level": "X", "cardno": "012", "xws": "balancedmeasure" },

I know it may seem like I am being a bit of a pain in the (*& when it comes to specifying how it would look, but there are very good reasons, as there are a whole bunch of programs and tools I would have to change if we alter the format too much. So, from a very selfish standpoint, this is what would work best for me.

Zachu commented 2 years ago

I know it may seem like I am being a bit of a pain in the (*& when it comes to specifying how it would look, but there are very good reasons, as there are a whole bunch of programs and tools I would have to change if we alter the format too much. So, from a very selfish standpoint, this is what would work best for me.

Nah, it's your project and you have all the power in the world to define the boundaries for others to work within. And if they can't then they can fork off :)

To be clear, there are reasons for the very specific order that data is within each of the JSON files (in terms of what shows first, second, etc.). I do not want to change that order.

Bummer... Well this should be achievable with little work.

Ideally, each new key would be entered just above the "xws" key which is the last key for each entry.

This is a bit of PITA since as the JSON specification says, "An object is an unordered set of name/value pairs" which means that the keys are not in any order/order shouldn't matter. I have to see how many hoops it requires me to jump through though :)

All things being equal, each additional item you want information for would be its own key.

I'm OK with this one. Flat structure is completely fine.

Finally, we have always the option to have the card data in a separate project as well so then it would not fiddle with wherever Worldhaven is being used.

any2cards commented 2 years ago

So I spent some time thinking about this, and looking at my various tools that make use of the repository. I went ahead and made some changes, to make things easier for this implementation. You can place your stuff anywhere, so if I am assuming correctly, it would be easiest to place it at the end of an entry, after "xws", that is fine. Just remember you will have to add a comma after "xws". In addition, I have made it so entries don't have to be flat. So if you would prefer something like:

"name": "balanced measure", "points": 31, "expansion": "Gloomhaven", "image": "character-ability-cards/gloomhaven/BR/gh-balanced-measure.png", "initiative": "77", "level": "X", "cardno": "012", "xws": "balancedmeasure", "values": { "initiative": "77", "level": "X", "cardno": "012" }

this would be fine,. Note that I think all of these including initiative, cardno, etc. should be in quotes (as character strings) rather than numbers, as there are situations with both of which I am aware that cannot be pure numbers.

In addition, I personally don't think "values" or "data" are a good key name; but for the life of me, I haven't been able to think of a better one at the moment.

Zachu commented 2 years ago

I don't mind the values being on the flat level with everything else and I also can't think of a better key there so let's go with flat structure then :)

I went ahead and made some changes, to make things easier for this implementation. You can place your stuff anywhere, so if I am assuming correctly, it would be easiest to place it at the end of an entry, after "xws", that is fine.

Nice, thank you! Yeah the keys currently go after xws basically because I'm merging the output the OCR gets with jq into the existing ones. And therefore the new keys just gets appended into the old objects.

Just remember you will have to add a comma after "xws"

Thanks :) I'm mainly using jq and I let it handle the correct rendering of the json. But better safe than sorry!

Note that I think all of these including initiative, cardno, etc. should be in quotes (as character strings) rather than numbers, as there are situations with both of which I am aware that cannot be pure numbers.

Only being played one scenario of Gloomhaven right now I cant argue back at all 😅 With initiative I think that sounds plausible. For card numbers that sounds weird, but I can cast them into strings. I don't have any issue with that. But I guess Crimson or other addons could prefix their cards like that.

JaMeZ-B commented 2 years ago

First of all, thank you for the work you put into this already @Zachu!

Note that I think all of these including initiative, cardno, etc. should be in quotes (as character strings) rather than numbers, as there are situations with both of which I am aware that cannot be pure numbers.

Only being played one scenario of Gloomhaven right now I cant argue back at all 😅 With initiative I think that sounds plausible. For card numbers that sounds weird, but I can cast them into strings. I don't have any issue with that. But I guess Crimson or other addons could prefix their cards like that.

A few remarks on "numbers vs strings":


Considering the amount of data this repo contains, one could consider defining a JSON schema to validate the files. This can become quite handy in finding faulty entries – for example in the cases, where the OCR is not right. I will fiddle around with this a bit in the coming days (there is some code I can recycle from a project I used JSON schema in) 😉

Zachu commented 2 years ago

I actually can agree with all of your points about the strings. I go with the string route with all of them. Thank you!

Considering the amount of data this repo contains, one could consider defining a JSON schema to validate the files. This can become quite handy in finding faulty entries – for example in the cases, where the OCR is not right. I will fiddle around with this a bit in the coming days (there is some code I can recycle from a project I used JSON schema in) 😉

Yeah defining a schema does sound a good idea. In my scripts I'm not blindly taking arbitrary output from the OCR so that shouldn't be an issue, but I might have bugs and whatnot in the scripts that the schema would reveal. And who knows what other ways there are in the future of appending stuff in there.

Zachu commented 2 years ago

Starting to get the accuracy of the tesseract model, and the overall tooling pretty much there. Still have some problems that the model can't tell 6 and 0 apart from each other :sweat_smile: I'll try to fix that now.

any2cards commented 2 years ago

I have updated character-ability-cards.js to include the following Meta information: Level, Initiative, and Card #.

Zachu commented 2 years ago

Out of curiosity, what did you use as source for that data? Did you build on top of my OCRing PR or did you find another source of truth?

any2cards commented 2 years ago

Lol. To be honest, the answer is funny. I had another request from a good friend to add this same information. It came two days ago. He has helped me an enormous amount in the past, so I was willing to do whatever to help. Since I did not know how long your efforts would take, I simply wrote some code to add all the lines with a value of "-" (which for example the card backs still retain), and then I manually entered all of the data.

Now, your efforts are not a waste. Perhaps you can double check my work by generating your own file, and we can diff the two and see if my manual efforts are accurate.

any2cards commented 2 years ago

Since this has been added, I am closing this issue.