exult / exult

Exult is a project to recreate Ultima 7 for modern operating systems, using the game's original plot, data, and graphics files.
http://exult.info
GNU General Public License v2.0
515 stars 82 forks source link

RFC: Usecode woes #330

Open marzojr opened 1 year ago

marzojr commented 1 year ago

If you have been paying attention to the commit logs, I have been analyzing original usecode, and found (and fixed) numerous issues in the way UCC generates usecode compared to what the originals expect.

One major issue I encountered points to a major departure from the way Exult and the originals handle usecode scripts (those ran through (delayed_)?execute_usecode_array). Explaining this will require some background.

When Exult processes usecode scripts, they come as usecode values and Exult processes them as such. Each opcode is typically pushed as a byte and operands as words; but they are read as usecode values in the script engine. When a repeat or repeat2 opcode is encountered, the distance jumped back is interpreted as being in usecode values. The usecode generated by UCC matches these assumptions (except for some changes I made today that fix a couple cases).

Based on my analysis of original usecode (more info below), I believe that the originals handle this differently: I believe it uses a byte array instead, and each usecode value is converted to either one byte (when it fits into one byte) or two bytes (otherwise); but it seems to be a buggy conversion, which only works if the value is positive (more info below). If an opcode expects a word, it reads two bytes from the array. The repeat and repeat2 distance is in bytes in this converted byte array.

I am wondering about how to go about fixing this. Any of the options below (except 1) will require modifying UCC to count the bytes for repeat/repeat2, especially if we want to use UCC-generated usecode in the original games (I have done so in the past, and I am thinking of recreating the necessary changes and commit them). This would also make mods compiled with older versions of UCC not work as intended in newer Exult.

The options I can see are:

  1. Ignoring the issue. Hey, it is working now...
  2. Converting to byte arrays internally to mimic the way the originals worked. This would also require changing the savegame format, as we currently save the usecode values. This probably requires the most work.
  3. Keeping usecode values internally, but use a more sophisticated method for handling these issues. This will probably be easier to implement, but slower as this needs to be accounted for in every opcode and every time repeat and repeat2 need to branch.

I am leaning towards option (2). Any ideas/suggestions?


As for the info I promised. First off, the easy one to spot: in both BG and SI, every call opcode to a function < 0x100 is followed by an extra 0 value. There are a couple call opcodes in SS that are followed by a 0 value, but it seems to be in error. There are also a few instances of repeat opcode in both BG and SI that have an off-by-one error on cases when

When toying with modified usecode, I found that:

What I believe they do to convert into bytes is:

  1. Read the (signed word) value of the usecode value
  2. If it is less than 0x100, then a single byte (the low byte) is emitted
  3. Otherwise, two bytes are emitted

Step 2 is the root of the wonkiness with negative numbers: since a negative value is less than 0x100, only its low byte is emitted.

DominusExult commented 1 year ago

breaking the savegame compatibility is a bold step which would necessitate an Exult v2.0 version :) But I'm torn.

@drcode1 @wench @Dragon-Baroque

marzojr commented 1 year ago

For what is worth, the old savegames could easily be made to still work on new versions of Exult by converting the script when loading. It is just that new savegames would not work on older versions Exult anymore.

DominusExult commented 1 year ago

For what is worth, the old savegames could easily be made to still work on new versions of Exult by converting the script when loading. It is just that new savegames would not work on older versions Exult anymore.

hmm, that will have a better look

KnightCaptainU7 commented 1 year ago

Ages ago I added an si_usecode_script.uc header file that gave friendly names to the script bytes. My mod's code uses as-close-as-possible coding to the original usecode/agil. So I don't use script blocks, but UI_execute_usecode_array and UI_delayed_execute_usecode_array instead.

My notes at the time list "The Usecode Compiler (UCC.exe) does not recognize commands it should know, but apparently does not." I did not list which ones at the time. :/ http://exult.sourceforge.net/seventowers/code.php?TITLE_IMAGE=usecodetitle.png&DATAFILE=ucc_scripting.dat

enum script_commands { continue_script = (byte)0x01, // (byte)0x02 only used once repeat_script = (byte)0x0B, wait_1_tick = (byte)0x21, no_halt = (byte)0x23, wait_ticks = (byte)0x27, wait_avatar_tiles = (byte)0x2B, finish_script = (byte)0x2C, remove_script = (byte)0x2D, step_n = (byte)0x30, step_ne = (byte)0x31, step_e = (byte)0x32, step_se = (byte)0x33, step_s = (byte)0x34, step_sw = (byte)0x35, step_w = (byte)0x36, step_nw = (byte)0x37, lower_z = (byte)0x38, raise_z = (byte)0x39, set_frame = (byte)0x46, hatch_egg = (byte)0x48, set_egg = (byte)0x49, next_frame_stop_at_max = (byte)0x4D, next_frame_wrap_to_zero = (byte)0x4E, prev_frame_stop_at_zero = (byte)0x4F, prev_frame_wrap_to_max = (byte)0x50, script_bark = (byte)0x52, step_forward = (byte)0x53, music_track = (byte)0x54, call_usecode = (byte)0x55, speech_track = (byte)0x56, // Not used in SI, but listed in Marzo's documentation sound_effect = (byte)0x58, face_dir = (byte)0x59, npc_frame_stand = (byte)0x61, // Do these all automatically face the current direction? npc_frame_walk1 = (byte)0x62, npc_frame_walk2 = (byte)0x63, npc_frame_use_it = (byte)0x64, npc_frame_swing1 = (byte)0x65, npc_frame_swing2 = (byte)0x66, npc_frame_swing3 = (byte)0x67, npc_frame_swing2h1 = (byte)0x68, // Swing frame 23 npc_frame_swing2h2 = (byte)0x69, // Swing frame 24 npc_frame_swing2h3 = (byte)0x6A, // Swing frame 25 npc_frame_sit = (byte)0x6B, npc_frame_lean = (byte)0x6C, npc_frame_kneel = (byte)0x6D, npc_frame_lie = (byte)0x6E, npc_frame_cast1 = (byte)0x6F, npc_frame_cast2 = (byte)0x70, script_damage = (byte)0x78, attack = (byte)0x7A, resurrect = (byte)0x81 };

marzojr commented 1 year ago

I need to remember to update that document; I recently added nop2 (corresponds to (byte)0x02) and raw(integer) in order to fully replicate the buggy mess that is SI usecode scripts, as well as the ability to dump the contents of variables in a script block because BG usecode does that. If you look over, you can see that your list is now out-of-date and missing a few opcodes compared to script blocks.

For what is worth, I know that there are a few opcodes that neither Exult implements nor UCC accepts (and which are not used in stock usecode); here are some notes I had from way back when I was looking at this:

I suppose I could implement 0x22, 0x51, and 0x57; but I never thought any of those were particularly useful.

KnightCaptainU7 commented 1 year ago

My notes are from 2017, so yes, outdated.

If Exult isn't using some of these, I can't recall a part of SI where a cutscene seems different from the Origin originals.

marzojr commented 1 year ago

So, I was poking around and I think I figured out the format of a file called ACTION.DAT from gamedat/saves. By playing with very low cycles in DOSBox, doing some hex editing, and playing with custom usecode, I figured out the following info:

Header:
    16-bit current time (in ticks)
     8-bit counter, not sure of what
    16-bit counter, number of scripts saved
    <sequence of scripts>

A single script has the following format:
    16-bit value, indicates the object the script operates on
    16-bit value, tick for next action in the script
     8-bit value, seems to be some sort of flags
        b0    Script executed 'remove' opcode
        b1    Seems to indicate that the object is a missile? Scripts with this set always have opcode 0x2E in them (more below)
        b2    Script executed 'finish' opcode
        b3    Script executed 'nohalt' opcode
        b4    No idea
        b5    No idea; seems to be related to the 8-bit counter in the header
        b6    No idea; seems to be related to the 8-bit counter in the header
        b7    No idea; seems to be related to the 8-bit counter in the header
     8-bit value, index of the next opcode to execute
     8-bit value, length of the script+1
    <script data>

The script data is composed of bytes: opcode, followed by any parameters. It seems to be like this: any opcode parameters that are (in usecode) >= 0x100, then low byte of parameter followed by high byte of parameter; otherwise, just the low byte. Negative values, in particular, have the upper byte discarded, and hence, are effectively one byte only.

Thanks to a gavel and a parrot, I found out that strings are saved directly in the script, surrounded by double-quotes. If the first double-quote is edited into something else, the string is displayed as normal -- unless it is edited into a _, in which case it is not shown. I found this because UI_clear_item_say modifies the leading double-quotes of all text yet to display to _. Editing out the ending double-quotes is just bad; the game seems to go through RAM until it finds a double-quote and displays everything in-between... or it crashes and burns, apparently due to stack corruption, if the next double-quotes are sufficiently far away.

Because the script position is a byte, scripts are limited in length to 255 elements (it seems to be unsigned); but trying to create scripts with length > 128 (including the length byte) either cause stack corruption or are discarded when loading ACTION.DAT (closer to 128 seems to cause the script to be discarded, further from it corrupts stack), so I guess they are limited to 127 bytes plus the length byte.

Which is kind of small when you factor in that strings are dumped into the script surrounded by double-quotes.

Searching in RAM also shows that the script portion (including length byte) are copied as-is from ACTION.DAT, so it is likely what the game uses when executing the scripts. The 'repeat' and 'repeat2' opcodes jump back distance is in bytes of this representation.

Not sure how much of this I want to emulate in Exult, other than fixing the behavior of repeat/repeat2 opcodes.

Edit: Forgot about missiles. While trying out a few things, I noticed that there were scripts when missiles were on the screen, such as crossbow bolts. I tried a few missiles, and the script for missiles seem to go like this:

  1. Opcode 0x2E at the start. Without this on, the bit I mention in flags is clear, I don't know if it does anything else.
  2. Either opcode 0x22 or nop1 (0x02), depending on the missile.
  3. Opcode 0x79 followed by a byte parameter. Changing this makes the projectile hang in the air (if it is the single one) or causes another projectile to move faster, so it is likely some sort of identifier of which projectile the opcode applies. So this seems to update the position of missiles.
  4. repeat (0x0B) jumping back to the opcode 0x22/nop1 and a repeat count of 255.

Saving, editing to the wrong missile ID, and checking repeatedly for it, 255 seems to mean an infinite number of repeats. I tested it with some custom scripts and confirmed this.