Gibberlings3 / iesdp

Jekyllized version of IESDP with imported history
https://gibberlings3.github.io/iesdp/
19 stars 22 forks source link

Machine readable format #84

Open burner1024 opened 4 years ago

burner1024 commented 4 years ago

I'm thinking that it could be beneficial to have IESDP data in machine readable format. Currently there are multiple tools all parsing and reading IE files on their own: DLTCEP, NearInfinity, iesh, probably others. Inevitably, differencies and inconsistencies arise. But if file formats were described in a structured way, we could have a single, truly definitive source of information, and pull updates from it (semi)automatically. (Which is my ulterior motive for BGforge MLS.)

As an example, take a look at sfall docs: functions listed in a yaml file, with some python scripting it converted to markdown, and markdown is published with jekyll, resulting in a nice site.

If you look at opcode description, it's already basically yaml, with html additions. Could probably converted into true yaml semi-automatically.

To be clear, this is not about IESDP looks, just internal data representation: binary file formats, script actions/triggers, effects.

Tagging @Argent77, @4Luke4, @AvengerTeamBG, @FredrikLindgren, @ALIENQuake.

lynxlynxlynx commented 4 years ago

Something like this might have been useful 15 years ago, but now that almost everything is already written, I don't see anyone gaining anything by rewriting all the parsers, writing generators and rewriting the way the data is stored.

On the IESDP side of things the last part may still happen though, since it could then convey the data better (eg. #6 #19 #26).

burner1024 commented 4 years ago

almost everything is already written

MLS isn't, hence this issue. Obviously, I can't speak for devs of other tools, but MLS would benefit.

@lynxlynxlynx let me put it like this: if a such pull request was coming your way, would you be opposed to it?

lynxlynxlynx commented 4 years ago

Depends on what everything you'd stick in it. Eg. if you want to make the file formats machine readable, that's great and wouldn't need to affect deployment, since jekyll has the needed support: https://jekyllrb.com/docs/datafiles/

burner1024 commented 4 years ago

Do you want to define a format for, say, opcodes? Or I should do that myself?

lynxlynxlynx commented 4 years ago

I don't know what you mean, since the opcodes are already easily machine readable. Everything is parametrized and the description is plain html most of the time (a few liquid tags here and there).

burner1024 commented 4 years ago

Well, html with liquid parts is not exactly machine readable. Still being parametrized is exactly why it's easier to start with them.

To make an example, maybe something like this _data/opcodes.yml

- n: 0
  name: AC vs. Damage Type Modifier
  type: stat
  param1: AC Modifier
  param2: Type
  bg1: 1
  bg2: 1
  bgee: 1
  iwd1: 1
  iwd2: 0
  pst: 1
  doc: |
    Applies the modifier value specified by the 'AC Modifier' field to the category specified
    by the 'Type' field.

    Known values for 'Type' are:
      - 0   All
      - 1   Crushing
      - 2   Missile
      - 4   Piercing
      - 8   Slashing
      - 16  Base AC setting (sets the targets AC to the value specified by the 'AC Modifier' field.

    If the targets AC is already 'AC Modifier' or below, this effect will do nothing).

    Each modifier type to AC from this opcode is capped to the range [-20, 20]. Each AC type total is capped to the range [-32768,32767].

  notes:
    - |
      IWD1 and PST use a slightly different version. The "Base AC" sets to **field - 1** instead.

      IWD2 uses different parameters altogether.
ALIENQuake commented 4 years ago

The "Known values" can also be one of yaml key with values, right?

lynxlynxlynx commented 4 years ago

The notes are sometimes interspersed in the description, not always at the end. And there can be several of different severities, so the proposed format is not good enough. I also see no way to avoid needing to clean up the descriptions on the user side of things, eg. to get rid of broken links.

And I'm definitely opposed to cramming them all into one file. That's a simple step users can do if they need it.

burner1024 commented 4 years ago

The "Known values" can also be one of yaml key with values, right?

Right, but I'm not sure if will be possible to parametrize them and keep current html layout unchanged. And I don't know how I'd use this data yet. If anyone needs it, they are welcome to chime in.

The notes are sometimes interspersed in the description, not always at the end. And there can be several of different severities, so the proposed format is not good enough. I also see no way to avoid needing to clean up the descriptions on the user side of things, eg. to get rid of broken links.

Severity is easy to deal with, adding a separate stanza (warning, important) should be enough. Being interspersed is harder. Not sure if there's a way to parametrize this properly. I'll search around, if not, I guess keeping the doc monolithical will have to do.

Could you clarify the bit about broken links? Which ones do you mean, why are they broken?

And I'm definitely opposed to cramming them all into one file. That's a simple step users can do if they need it.

Jekyll can read data from dirs, something like _data/opcodes/0.yml. Though it does start to sound like simply moving/renaming opcodes dir, but I'm trying to work out something that'll work for actions and stuff with minimal changes later.

lynxlynxlynx commented 4 years ago

Links: opcodes link to each other via anchors, so you'd end up with broken links unless you replicated the naming and therefore layout (all in one file).

Actions and triggers won't be much better; except for less data, the same problems apply.

burner1024 commented 4 years ago

Depending on how this works out, links might be kept and lead back to IESDP: Captura de pantalla de 2020-02-18 16-53-51

Maybe it's better to start with something else than opcodes, indeed. Here, I took a shot at actions. Sample data is stored in the data files. I wanted to avoid duplicating info (action numbers in both filenames and files themselves, etc), but liquid is not flexible enough, so the overall system is almost the same as opcodes, just doesn't require setting game to 0 to filter it out (can be just skipped). You can launch it locally, check out BG1 and IWD2 action pages.

Captura de pantalla de 2020-02-19 14-02-20

lynxlynxlynx commented 4 years ago

Looks good. From what I can tell, action descriptions only use colors (can be replaced) and links (markdownify takes care of that?) on top of what's shown above.

burner1024 commented 4 years ago

Loaded BG2 actions. There's minor difference in styling, mostly because markdownify wraps everything into paragraphs. I made adjustments, but didn't go out of my way to make it exactly the same before getting feedback. (One thing I did fix intentionally is ugly codeblocks)

If all's good, next step would be to load actions from other files, run comm/md5 to find and delete the identical ones, then some diffs to find and combine those that only differ in wording, then move all the rest as variants.

lynxlynxlynx commented 4 years ago

Sounds good, but let's resolve #85 first, so it's clearer how some of the more interesting actions turn up (since you skipped a few).

burner1024 commented 4 years ago

I didn't skip anything on purpose, could you point out some?

lynxlynxlynx commented 4 years ago

Most for #85, but also 349 that doesn't have any colouring. Do something like find | wc -l in the dir and you'll see it doesn't match the max action id + 1.

burner1024 commented 4 years ago

These were missed due to a mistake in the script, now included. Count won't match anyway, since there are gaps (NID*). 349 is a special one, it has some formatting messed up, but I think it can be corrected latter, after adding variants, along with other manual cleanups here and there. The table in 349 is just styled as code. I think it's not worth to add a separate kludge just for it, considering that it looks fine, but if you think it's important to have the same background, I guess it can be done. Pushed updated data.

lynxlynxlynx commented 4 years ago

NIDSpecial1 and co are not a gap, just useless to the modder.

burner1024 commented 4 years ago

Well, they are counted as one "action" as far as Jekyll is concerned. Otherwise, they'd produce a full list of "not working" actions, so I thought to keep them combined. If everything's looks so far, please let me know, I'll proceed with variants.

lynxlynxlynx commented 4 years ago

It doesn't matter for the output, but for people like you that want it as data, it makes no sense to jumble them together.

burner1024 commented 4 years ago

I'm not sure what use they are, only completeness. Certainly no point in adding them to completion, everything not working will be skipped. So do you want to separate them?

lynxlynxlynx commented 4 years ago

Yes, just for consistency.

burner1024 commented 4 years ago

All right. I will do that at manual stage. Anything else?

lynxlynxlynx commented 4 years ago

Nothing comes to mind, except that let's do plain bg2 first and iron any problems out before continuing. Also, another PR is open that touches the ee action list and it would just cause conflicts if it wasn't merged first.

burner1024 commented 4 years ago

You mean merge upstream? If just BG2 has manual updates applied and merged, that'll make it a little harder to search for differences later. Also, there won't be links to variants, since that data doesn't exist yet.

One more thing I'd like to point out, currently action aliases are added to the same file, how's that?

lynxlynxlynx commented 4 years ago

Ok, then wait a bit, since @4Luke4 is almost done.

Aliases I don't like that way, since I think eg. the RES variants are present in some games without the default version, so it would complicate the layout logic to iterate properly. Just keep it KISS and create a separate file.

burner1024 commented 4 years ago

I would like to avoid duplicating descriptions in data and displaying duplicates too, but not sure how to do that while allowing for variants. I'll think about that meanwhile.

Edit: ah, current version doesn't have variant links too, so my second point about merging is moot.

lynxlynxlynx commented 4 years ago

Ideally the descriptions wouldn't be duplicates anyway, since they should explain the parameters used.

burner1024 commented 4 years ago

There's many Dialogue/Dialog synonyms, though.

lynxlynxlynx commented 4 years ago

Can just be a link there, eg. "Synonimous with opcode blabla."

4Luke4 commented 4 years ago

Ok, then wait a bit, since @4Luke4 is almost done.

I'm rewriting script triggers so that they can be referenced via links/anchors. That'll take a while...

lynxlynxlynx commented 4 years ago

Please finish the last little bits in the current PR first, so it can be merged and the action work can continue. Triggers will get automatic anchors when this work is extended too, so I don't know if that's the best use of your time.

burner1024 commented 4 years ago

I moved synonyms to separate files. Trouble is, there's no proper sorting by data filename, so some synonyms comes before descriptions, some after. I added sort by action name first, then number, that helps, but probably not perfect. Captura de pantalla de 2020-02-24 01-15-20

lynxlynxlynx commented 4 years ago

Looks fine to me, even if some corner case gets missorted.

burner1024 commented 4 years ago

ok, so I'm on standby until the other pull gets finished.

lynxlynxlynx commented 4 years ago

It's done. :)

burner1024 commented 4 years ago

Loaded other actions, worked fine. There's more divergence than I expected. Combined literal dupes. Maybe a quick manual comb will cut some too, but I doubt it will be very fruitful.

So far the issues I see are mostly related to variants/anchors:

  1. Variants are searched by number, sometimes an action is quite different, I wonder if it's better to switch to name (but then there are non-unique names) or number+name (but that will likely miss some).
  2. Since synonyms-aliases have the same anchor, and their order is not guaranteed, sometimes they can link to themselves instead of the master action, although that should still land pretty close. (BTW I changed "Synonymous with .." to less definitive and less wrong "See ...")
  3. Actions show all different variants, not necessarily unique. Opcodes show only unique ones. I don't like both ways, as first one is cluttering, and second one kind of implies that games not shown in variant list don't have such opcode at all. Maybe it would make sense to show links to all pages where a variant exists, but group non-unique ones.
  4. Because of 1 and 2, links to variants for action aliases lead to the main action variants, even if corresponding variant exists for the alias itself.

But there's good news, too: now it builds 10 times longer :).

So at this point I'm looking to confirm that overall data import is good, then decide whether and what to do with issues I listed (maybe you'll notice some, too), do that, and then only manual brushup remains.

lynxlynxlynx commented 4 years ago
  1. We have the same problem with opcodes, but I just ignored it, since I don't think it's that important.
  2. Should still be within the same screen, so not that worrisome.
  3. I'm not sure I understand — isn't it just a data problem if an action is not marked as present?
  4. Also minor. I don't think we use the id anywhere though, so we could add an a/b/c/ suffix for the variants, which would also fix 2.
burner1024 commented 4 years ago

3 - no problem in data, here's what I mean: Opcode: exists in all games, is marked as such, present on all pages, but the links only show "BG1" or "IWD2", leading to believe that it doesn't exist in other games. Captura de pantalla de 2020-02-25 09-26-24 Captura de pantalla de 2020-02-25 09-27-38 Action (currently): exists in all games, is marked as such, present on all pages, shows links for all variants different from the current one (which is really the same variant on 4 other pages). Captura de pantalla de 2020-02-25 09-32-19 Captura de pantalla de 2020-02-25 09-32-28

I'll try to find a better option.

burner1024 commented 4 years ago

Here's a variant that displays all games in which the action exists, groups unique ones and indicates which are the same as the current one. Captura de pantalla de 2020-02-25 15-30-34

lynxlynxlynx commented 4 years ago

I wouldn't make that connection, would you find a different word than "variant" helpful? Though it does get muddier at high numbers, where the opcodes and actions can actually be missing.

Another option is to add another sister-like span with "Not available in: " and list the games without.

lynxlynxlynx commented 4 years ago

Ah, the opened tab didn't refresh. I see we think alike. Let's make it more explicit though, having three lists is confusing. You know which game's version you're looking at, since you opened that page and the titles reflect it. So I suggest we do the opcode way and only list the actual variants and then what I suggested earlier for the missing ones.

burner1024 commented 4 years ago

What 3 lists do you mean? If you jump through variants, current game may get confusing, that's why I added an extra clue.

So I suggest we do the opcode way and only list the actual variants and then what I suggested earlier for the missing ones.

That has a disadvantage of simply not having links to some games. For example, if you look at BG2 damage opcode: Captura de pantalla de 2020-02-25 21-27-27

Suppose I want to know which one of these works in IWD1. That's some scrolling and clicking and searching, and manual comparing to all 4 variants. While the layout on the previous screenshot will make it clear that IWD1 different to BG2, but identical to BG1, and provide a link to it.

Yet another way of what you propose could be to always display all games as a static list and just "disable" (make grey, no link, extra tooltip) those missing it. But that'd still miss uniqueness info.

Anyway, while I believe this extra info would be helpful to display, and hope you reconsider, that's not really the point of this issue, and can be changed later with ease. So just let me know with what we'll go for now.

On other items: 1) Actually, I think number+name should be good enough. 4) Again, after reconsideration, since the only thing an alias does is redirect to master action, it's probably a good thing that variants point straight to it. So I'd probably keep it.

So if there isn't anything else, only variants and manual cleanup left.

lynxlynxlynx commented 4 years ago

I see. Go for something like https://github.com/Gibberlings3/iesdp/issues/84#issuecomment-590746563 then, though it's not clear why BG2 is gray in the first line. Makes it look like it's not there, so I suggest you just don't bold it.

  1. Won't that make anchors and by extension urls needlessly long and mangled?
burner1024 commented 4 years ago

I mean keeping anchors the same, just filtering variants by both name and number.

burner1024 commented 4 years ago

OK, here's a bad one: Jekyll doesn't support liquid in datafiles. I guess that means converting them into a collection instead, like opcodes.

(BTW, somewhere in the beginning I used Jekyll 4 to get newer where_exp syntax, but it's got broken where instead, so I reverted to 3.x.)

burner1024 commented 4 years ago

How about an easier option: liquify plugin. The only difference would be

-        {{ action.desc | markdownify }}
+        {{ action.desc | liquify | markdownify }}
lynxlynxlynx commented 4 years ago

That would be fine, since we don't rely on github pages for the build, but this looks like it's only for liquid in frontmatter, not in the body.

burner1024 commented 4 years ago

Works fine, although it did require adding extra steps to note includes. Since game-specific urls vary ("iwd", "iwd1", "totl", etc), I externalized them for better de-duping. I think it's pretty much done. Please let me know what you think.

lynxlynxlynx commented 4 years ago

The diff looks fine. Ready for a PR?