deckarep / laffer-casino-extractor

a Python3 CLI tool to extract Larry Casino game (sprites, audio, etc) from the RESOURCE.VOL file
MIT License
3 stars 1 forks source link

Scanning the entire volume doesn't yet work #2

Open deckarep opened 4 months ago

deckarep commented 4 months ago

I tried running the code against my local copy of: RESOURCE.VOL and the code as it stands seems to choke on the first image asset which looks to be a 640x480 room.

If you run the code as-is it points to the peter texture.bin file in the textures folder and this works fine.

I'm done for today but if you want to improve on things feel free and I'll merge away. I'll try to keep the code in a building state going forward.

/cc @Doomlazer

Doomlazer commented 4 months ago

I submitted a pull that extracts from vol/resources.vol. It's amazing how many images are already perfect, but plenty of work left to do.

It's only extracting the first cel of each texture at the moment.

Also, the /img and /pal folders are created automatically now if they don't exist.

deckarep commented 4 months ago

Awesome, I went for a bike ride and came back to this present. Just merged it! Thanks for tackling some of the cleanup comments, and making progress!

Doomlazer commented 4 months ago

new pull request exports animation frames

deckarep commented 4 months ago

Bummer, did a full generation and tried to find other characters and they aren't showing up yet. I'll happily merge.

Doomlazer commented 4 months ago

You're right. I recall we should have around 900 images and that there is a bug where it sometimes reads past a new 'tex 0001' and an image gets skipped. This bug is likely responsible for some image corruption as well.

deckarep commented 4 months ago

Yes, when I skimmed all the images (quickly) I saw some images had bled into the top or out the bottom due to this bug and if it happens enough times would likely cause the problem of extracting short of the ~900 or so we are expecting.

I sent you an invite btw.

I also pulled latest and it seems like there's some indention issues...currently the whole file uses spaces and not tabs.

Doomlazer commented 4 months ago

Yeah the editor it opened to resolve conflicts did that. The new pull request should fix that and includes a hack that prevents skilling textures until we can figure out why we over read.

I limited it to 924 images because it hangs on 925 for some reason. I still don't see the other characters except Peter. I wonder where they could be?

Doomlazer commented 4 months ago

Thanks for the invite. That solves the conflict issues for me thankfully.

deckarep commented 4 months ago

One of the issues I'm seeing with the latest changes:

  1. Originally, when I was focusing on extracting the test peter texture in the test_textures folder, I got all of the Peter images extracted for that single series of sprites and I realized that at least some were differing sizes. When I created each image in the series at the correct size, the weird image offsetting issues went away.
  2. The latest code seems to ignore the fact that the each sprite in a single series could be a different size but it needs this because the weird offsetting issues have come back. For each sub-sprite in a texture there's a couple of bytes which have the dimensions of the sprite that need to be used when setting up the PIL image dimensions.

I'm kind of surprised that we're not seeing other characters show up. Could they coincidentally be the last remaining textures that we're ignoring.

One more observation: The sprites related to room backgrounds of size 640x480, if you carefully look at those...it looks like the resulting image generating is actually composed over several images....the sprite boundaries bleeding into these needs to get addressed.

Doomlazer commented 4 months ago

yeah I'm using the same width height for all sub cels. let me fix that

Doomlazer commented 4 months ago

Ok, now Peter looks better with the correct cel sizes, but sometimes I'm getting heights or widths way larger than 640*480 so I'm skipping those. Need to figure out how to handle those, but I'll probably call it a night.

deckarep commented 4 months ago

When we extract all textures from RESOURCE.VOL and we see the overlapping situation we are likely landing on the wrong width/height values therefore getting these invalid sizes. I think once we fix the boundary issues, where we are dealing with a truly single texture this problem might go away.

deckarep commented 4 months ago

Btw: fixing the issue of each sub-sprite having possibly different dimensions did fix a lot of other assets too so things are looking better.

I have an idea: since the game backgrounds are overscanning into adjacent sprites and actually hiding some of the true assets...at least for now we can do a two pass scan.

First scan builds a list of a MAGIC (tex 0001) offsets.

Then second pass will simply iterate the list and try to generate the series of sprites for each texture.

This should in theory identify ALL texture boundaries even if some of them don't decode properly and still overscan because each tex 001 asset will at least be attempted once.

deckarep commented 4 months ago

Ok, I added a pre-scan step...this will truly find all tex 0001 patterns and builds a list. This is a little slow to start but if it's annoying enough we can cache this file since it never really changes.

After building a list of offsets, it will now attempt to generating each sprite series...although some are still wrong but this was able to get a few more through the gate. Still no other other characters are showing up which is very strange.

Look at the sprites, Peter is 917 and we even see his character toes (when the user selects him and it's first person perspective of your character sitting in the jacuzzi playing). But I'm also seeing the other character toes near these numbers, so I would expect all characters now to be in this early 900 range...perhaps the code is not right and still managing to skip a lot of the data. It must be that...

Remember to get latest code otherwise more conflicts.

deckarep commented 4 months ago

I don't know what's going on: the fseries value is just incremented by 1 for each texture. But oddly, if you look at the generated files...they are not always consecutive...some simply aren't be generated. At first I figured it was because we're skipping those sprites which are greater than 640x480 but i'm not seeing those logs spit out around the numbers that are missing.

Look at this:

... sprite_901_0.png sprite_903_0.png sprite_905_0.png sprite_907_0.png sprite_909_0.png sprite_911_0.png sprite_913_0.png sprite_915_0.png sprite_917_0.png ...

Doomlazer commented 4 months ago

Sometimes NUM_IMAGES is 0 so it skips the texture. Maybe NUM_IMAGES of 0 has a special meaning?

deckarep commented 4 months ago

Ah that's why...probably need to see if we can better understand the meaning of the unknown values and look at the hex it seems even with a zero there is palette and image data at first glance.

Doomlazer commented 4 months ago

Yep. I tried setting NUM_IMAGES to 1 if it was zero, but I think the width/height values change locations in that case as well.

Doomlazer commented 4 months ago

Added a simple function to export to the 'test_textures' folder so it's a bit easier to find and examine specific textures. Just uncomment the following in run() and replace the image number you want to export.

#extractBin(offTbl, 906)

I didn't add any error checking, because I figure we'll remove this function eventually.

deckarep commented 4 months ago

Awesome I'm all for adding whatever is useful for discovery since we still don't have everything extracting just yet.

I will be taking a break from this project for a few days since I have to deal with some overtime...but I'm eager to see if you uncover more of the story.

Doomlazer commented 4 months ago

It's already paid off, I just found the other character portraits in 904, 906, etc.

The first byte of the header is either 0x01, 0x0A or 0x11. The height/width are in different locations for 0x11 (haven't looked at 0x0A yet).

I think there should be animation frames for the other portraits, but I'll need to research the alternate formats closer tomorrow.

deckarep commented 4 months ago

HELLL YEAHHHHH! It was bothering me that the other characters were still not obviously in this .vol file but I'm so glad for your discovery.

Also please remember that the RLE commands that we have so far may not be exhaustive and still might need to find out what we're missing to get the rest of the sprites to render properly.

Curious since you found the characters do they render properly at this point?

Doomlazer commented 4 months ago

The new characters look perfect, but I only get one animation frame at the moment. I think there is more data to parse, but the width/height on subsequent frames isn't correct at the moment.

Yes we will need to come back to the backgrounds that still aren't decompressing, but I figure these skipped images would be easier to solve first.

deckarep commented 4 months ago

Ok, thanks for the pings about it! Please commit your changes when ready…I’ll circle back at some point to help.

On Mon, Jul 22, 2024 at 9:38 PM DL @.***> wrote:

The new characters look perfect, but I only get one animation frame at the moment. I think there is more data to parse, but the width/height on subsequent frames isn't correct at the moment.

Yes we will need to come back to the backgrounds that still aren't decompressing, but I figure these skipped images would be easier to solve first.

— Reply to this email directly, view it on GitHub https://github.com/deckarep/laffer-casino-extractor/issues/2#issuecomment-2244236250, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABQ73VFXAQTMEUEPPEHQTDZNXM4DAVCNFSM6AAAAABLGNFZGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBUGIZTMMRVGA . You are receiving this because you authored the thread.Message ID: @.***>

Doomlazer commented 4 months ago

0x0A and 0x11 animation cels seem to have a consistent width/heigh. The cels now export mostly correct, but there are still several issues with these character portraits. I've committed my latest changes to the repo, but I've got to call it a night for now.

deckarep commented 4 months ago

I was messing around with the latest code investigating these characters you found. For the larger character portraits I was trying to see how many cels for the first blonde girl: Drew. She has around 56 cels and either the last cel had all the mouth variations or the mouth movements are the following series.

This is a clue but I did not commit anything yet because I had to leave for the evening and my code isn't commit ready yet.

Doomlazer commented 4 months ago

Hmm, that's interesting. Does it seem like the mouth animations are packed in one big cel with the same width height as the other portrait cels?

I think there must also be some padding (maybe just a single byte) on these portraits between frames that we need to account for because the seem to get thrown off a bit after about two cels.

I didn't get time after work to look any closer, but hopefully today or tomorrow I can do some more digging

deckarep commented 4 months ago

It does seam like the mouth animations could be in one cell however maybe it's just how we're dumping things out.

I'm starting to suspect that these big character animations may be more setup like a traditional sci view. Basically a texture atlas where the image is a large size and all the variations are in the same image with something describing their bounding box and offset. I could be wrong though.

deckarep commented 4 months ago

New finding: I have confirmed that while investigating the large character sprites: 0x11 (17), each one does indeed have it's own width/height...and the reason we're seeing invalid values is because our parsing is somehow off a little bit (not by much, but we're somehow not landing at the correct final byte boundaries to begin the new image.

Additionally, because each sub-sprite has it's own width/height they are indeed slightly different sizes by as much as 10 to 15 pixels. You can observe this by noticing that sometimes the characters will animate their arms around, and in this case the image needs to be a little bit wider usually.

This was the case with the original Peter sprite as well, to get all his sprites to show up correctly his width/height is sometimes different depending on if that sub-sprite needs extra width (or height) to show him moving his arms.

Now, why we're off a little bit...I still can't figure out.

Doomlazer commented 4 months ago

Here's an odd pattern (with the current code in the repo):

edit: the code formatting sucks in GitHub comments :(

Screen Shot 2024-07-24 at 8 37 54 PM

Now we know the w/h is there after every frame so, remove the code that skips it and then add this odd bit at the end of the NUM_IMAGES loop to account for the 0,2,0,2,0 pattern:

Screen Shot 2024-07-24 at 8 39 07 PM

The offsets are correct until the cel that starts at offset 183117 where our code is now starting 2 bytes late at 183119! This is also the first cel that the w/h changes.

I have no idea if this 0,2,0,2 pattern holds for sequential cels with matching w/h or how to know when the w/h changes (maybe store the previous and check if it's different?). Just an observation.

deckarep commented 4 months ago

Hmm...when I was reading up on different RLE algorithms in some cases data could be padded due to alignment which I was also investigating. (I add a print_alignment function to help with discovery).

One other thing...I noticed that we're using lil endian for the 16-bit unsigned short values. But to form the short odd numbers get used sometimes which is unusual. So I'm investigating big endianess which seems to fall on power of 2 boundaries more often which typically makes more sense. But I recognize also a lot of stuff currently decodes with lil endian at this point.

I'm still focused on the large character portraits but don't have anything solid to commit yet.

That pattern you noticed is interesting...we can always try it and see how much further we get on decoding the characters.

We're really in the weeds now.

Doomlazer commented 4 months ago

So with some manual work I've found 906.bin contains 57 (sprite_906_0.png-sprite_906_56.png) images.

sprite_906_26.png is the face without a mouth and the mouths by themselves start at sprite_906_49.png.

The following code will keep the offsets correct for 906.bin, but I don't currently see how it's possible to determine programmatically.

Screen Shot 2024-07-24 at 9 33 11 PM

As you say there probably is some weird padding, but I'm too tired to explore further tonight.

deckarep commented 4 months ago

Awesome findings here..I know the pattern is elusive so far but just making sense of the data is a step in the right direction. I am hopeful we can figure it out.

Doomlazer commented 4 months ago

In the middle of the night I checked what values I was consuming to keep the cel offsets correct:

series 906 starting cel 0 at 782 consumed 2 byes: b'\x02\x00' starting cel 1 at 26825 starting cel 2 at 52882 consumed 2 byes: b'\x02\x00' starting cel 3 at 78935 starting cel 4 at 104973 consumed 2 byes: b'\x05\x00' starting cel 5 at 131047 starting cel 6 at 157083 starting cel 7 at 183117 starting cel 8 at 209508 starting cel 9 at 235598 consumed 2 byes: b'\x03\x00' starting cel 10 at 261770 starting cel 11 at 287825 starting cel 12 at 313907 consumed 2 byes: b'\x04\x00' starting cel 13 at 339991 starting cel 14 at 366081 starting cel 15 at 394458 starting cel 16 at 421755 consumed 2 byes: b'\x02\x00' starting cel 17 at 448003 starting cel 18 at 474080 consumed 2 byes: b'\x03\x00' starting cel 19 at 500154 starting cel 20 at 526215 starting cel 21 at 552288 consumed 2 byes: b'\x02\x00' starting cel 22 at 578360 starting cel 23 at 604403 consumed 2 byes: b'\x02\x00' starting cel 24 at 630475 starting cel 25 at 656517 consumed 4 byes: b'\x00\x00\x01\x00' starting cel 26 at 682557 consumed 2 byes: b'\x0f\x00' starting cel 27 at 708607 starting cel 28 at 735586 starting cel 29 at 762504 starting cel 30 at 791344 starting cel 31 at 817971 starting cel 32 at 846811 starting cel 33 at 873729 starting cel 34 at 902569 starting cel 35 at 929196 starting cel 36 at 958036 starting cel 37 at 984954 starting cel 38 at 1013794 starting cel 39 at 1040421 starting cel 40 at 1069261 starting cel 41 at 1096179 consumed 2 byes: b'\x07\x00' starting cel 42 at 1123160 starting cel 43 at 1149204 starting cel 44 at 1175259 starting cel 45 at 1201330 starting cel 46 at 1227399 starting cel 47 at 1253573 starting cel 48 at 1279622 consumed 4 byes: b'\x00\x00\x08\x00' starting cel 49 at 1305668 starting cel 50 at 1307469 starting cel 51 at 1309239 starting cel 52 at 1311020 starting cel 53 at 1313039 starting cel 54 at 1315050 starting cel 55 at 1316920 starting cel 56 at 1318773

They look consistent enough that we should be able to check and account for 0xXX00 and 0x0000XX00, but I had trouble adding that logic to doRLE() at 2am. If you get a chance to give it a try, great. Otherwise, I'll look again tomorrow or possibly tonight if I'm not too tired.

Interestingly, the two times we need to consume four bytes instead of two appear right before the mouth-less portrait (26) and the mouth-only cels (49).

deckarep commented 4 months ago

Awesome work! If I can wrap my head around this I'll see if I can integrate it. Nice work as usual.

Even if we don't decode exactly like the game does, but we're still able to extract the sprites and they look good I'm happy!

deckarep commented 4 months ago

I confirmed 906 portrait looks good on my end. The series of images extract perfectly, after adding your code to the tail end of processTexture.

But it seems like these values need to get parsed at the beginning and it represents a count of cells for a series of related sprites. The problem though is that the number is sometimes a unsigned 16bit value or sometimes an unsigned 32bit value. Either way, this number seems to just be a count of how many related sprites there are.

If we were to parse this all the right way...we'd have to detect when a short vs an int needs to get handled which sucks.

I tried it on 898 - Drew character and this EXACT pattern didn't hold. But the general pattern probably holds where it's just a count in a series of related animation sprites. I suppose each character can simply have different counts.

deckarep commented 4 months ago

I was trying to see if I can get another character outputting by manually figuring out the amount of bytes that we need to skip based on your last findings. I was able to get character 904 to show up 100% accurate now but the table of data is different.

See the branch: hacked-offset-table and if this proves helpful I can merge it into the main branch. All it really does is migrate data of how much skipping we need to do out to an external file. Obviously we should go down the path of figuring out the parsing algorithm to generate it...but worse case scenario at least we have a way to accurately generate the characters with manual work.

I also added some flags in this branch: --audio generates audio, --series allows you to limit the generation to one or more single series using a comma delimited list of integers.

Doomlazer commented 4 months ago

I didn't notice it, but you're right. The 0xXX00 seem to indicate the cel count for each animation loop.

I like those command line flags, they will definitely need to make it into the main branch eventually. While I like the JSON offset table for now, I'm confident we can solve it programmatically.

Thinking about it now, part of the problem was I've been looking for the total cel count, 57 in 906.bin, but I bet it's just like SCI view formats - There is likely a loop count at the start of each view, then the start of each loop contains a cel count, which tells us everything we need to know.

Of course it would be nice to know why sometimes it's 16 bit and others 32 bit cel counts, but probably won't be too difficult to account for even if it's unclear why it's like that.

It's been a long day so I may or may not get around to working on this tonight, but I bet we have the character portraits solved by this weekend.

deckarep commented 4 months ago

Yeah you are likely more familiar with the view formats for SCI than I am and I'm glad for that. By all means let's shoot for solving programmatically! We certainly don't have to merge the json offset tables...but it might be helpful to continue some discovery but if you feel like you can wrap your head around the view format and want to tackle it this weekend I'm looking forward to it!

The loop count at the start, and cel count for each view loop does make sense. In terms of 16 vs 32 bit...maybe it's as simple as the claim: if the first word (16-bit value) is zero, then it must be a double word (32-bit int) value...because the cel count must always be > 0.

BTW, I just want to thank you for your time investigating all of this. I hope you also can find a way to leverage the sprites for some of the cool projects you work on. It would be cool to see that. I always feel good when I can spend some time hacking on SCI, Sierra, AGI stuff.

Doomlazer commented 4 months ago

Well, there does not seem to be a loop count :(

However I think we can do without one. I've created a new branch for testing called 'a-mess'. It can actually export quite a bit of the loops/cels for character portraits automatically.

That said, there are still a lot of issues. I'm skipping the small character portraits because I was getting a crash. Also, the large character portraits are skipping some loops for some reason. I'm too tired tonight to keep things straight in my head, but feel free to add to the mess - it will need a lot of clean up before merging into the main branch, but the results do seem promising.

Doomlazer commented 4 months ago

BTW, I just want to thank you for your time investigating all of this. I hope you also can find a way to leverage the sprites for some of the cool projects you work on. It would be cool to see that. I always feel good when I can spend some time hacking on SCI, Sierra, AGI stuff.

It's fun. It's been a pleasure tackling this challenge with you. It's always better to work in collaboration when possible, IMO.

Even if I don't use the sprites it will be enough to know that we're likely the first to pull this off. That said, I've got my eye on those roulette sprites. Maybe I'll put a roulette mini-game in-between the trivia stages of that JS game I dumped the cyberlarry 2000 audio for.

deckarep commented 4 months ago

Yeah the roulette sprites look like they could be 3d renderings possibly along with the game backgrounds.

Around the late 90s 3D started digging its heals in and you can kind of see where the trend is going.

Of course my heart will always be in the realm of 2d and hand-painted backgrounds....ah the good ol days of gaming.

deckarep commented 4 months ago

I updated the hack-offset branch to include the consume offset tables because I was eager to see how much of the characters honor the pattern we have so far.

Only 2 characters don't fully decode: 898, 912 - could be error on my end but perhaps not. Maybe these tables can offer clues on your work.

Doomlazer commented 4 months ago

Have you also been able to extract the small character portraits? I'm fine with extracting from the JSON tables as I think the bigger challenge will be getting the backgrounds to decode. I know the backgrounds aren't a priority for you, but it would be nice to have the tool extract every image possible.

deckarep commented 4 months ago

I don't recall how the small portraits are used in the game. Do you remember how you see them? Maybe they were used for online play.

I'll investigate that.

I'm dreading the backgrounds...it's going to be tedious to figure out because the pixel data is lot more dense vs most everything else. But I'll give it a shot.

The json tables were just meant to quickly get the characters out and to see if the pattern holds. Two characters don't fully decode but I have no problem if you want to do it the right way.

I agree getting all assets to decode would be nice. I would like the backgrounds fully extracted.

Between the audio, chars and backgrounds and UI we could recreate the game for modern computers...the hardest part is just coding the game logic for poker, blackjack, craps, slots, etc but that's been done to death...it would be easy to reference other repos. :)

Doomlazer commented 4 months ago

Hmm. I'd need to fire up the window98 VM to check where the small portraits are used. Maybe the animations are identical and only needed for a reduced resolution mode or something? Part of the problem is that we can't check how the game handles any of the online content anymore.

We've gotten so many pixel perfect results with the current RLE code that I'm wondering if the problem with the backgrounds is that we aren't starting at the correct offset for them. I'd hope we can find a way to line things up that works with the current RLE code for backgrounds.

As for recreating LC entirely, I think it's too ambitious at the moment. I like what I thought was the initial idea of your project where the games were user theme-able with art packs and use these extracted images as one of the default templates. It doesn't even need to have four opponents. I liked the simplified version where it was the player against one opponent.

I'm surprised to hear you think the hardest part is the card game logic, because I feel like the card game logic is fairly trivial compared to recreating Larry Casino itself - the most difficult aspect in a total LC remake being reversing the LC game scripts to achieve decent accuracy against the original game.

I believe it would be simpler for you to create a loose card game framework with an extensible character animations format grouped into idle animations, win/loss reactions that wasn't dependent on the LC formats, but easy for users to structure animations that could be easily parsed into loops and cels that the engine could use. Also allowing the characters to optionally be voiced or text only would be good.

Anyway, plenty to think about and nothing has to be decided tonight.

deckarep commented 4 months ago

I was spit-balling about fully recreating Casino. If we had all the assets, it could be done (close enough anyway) but it's a large undertaking for sure.

The reason why I mentioned the game logic would be the hard part at least for me is because (in my mind) most of the game logic for Casino are the the rules around the games and a robust set of casino game rules can get very nuanced at times. Blackjack and poker would be easy and slots but the others would take some research to create comprehensive gameplay. It just might be because I don't know those other games well enough.

If you look at the sci-scripts repo for the Holye games many of the game logic scripts for the card games are several thousand lines of code just for a single game. Of course some of that is because of how SCI script is designed.

That doesn't even include online play and getting a server going with multiplayer sessions. It's a fun idea to think about but don't want to go down that path. I like the idea of Casino being playable on modern computers but it's lots of work.

I will try to stick to the goal of creating something like a simple single player poker game with swappable themes and character packs. Unlocking the Casino characters now gives me a 5th character pack because I started on 4 others that were easy to get.

I do want to do voices and will probably just attempt to start with a solution like ElevenLabs for the AI voices. It probably won't be perfect but good enough.

At some point I'll pivot back to that work and try to release something complete that could be easily extended.

Doomlazer commented 4 months ago

I added 3 small portraits to hack_offsets. They seem to be identical lower resolution versions of their larger counterparts, so you can just copy the larger arrays, but they only have around half the images. No mouth frames for speaking. Weird.

I would of done the others, but I'm still recovering from the work week and being lazy.

deckarep commented 4 months ago

Awesome, I was playing the game earlier on my Win98 VM and I just can't seem to find how they are used. In terms of the hacked_offset file, let me what makes sense long term.

If you feel confident like it's worth programmatically implementing the logic we can go down that path...but I'm also fine with merging what we have to main eventually and just leveraging this table at least until one of us feels the need to go do it "the right way".

deckarep commented 4 months ago

Actually, these low quality ones match up with this screen for being able to toggle the different emotion states.

However, in the game, these portraits still look like the higher quality ones to me.

States are (from left to right):

Screenshot 2024-07-27 at 2 22 45 PM