Brian151 / OpenShockwave

attempt at reverse-engineering and possibly re-implementing Macromedia Shockwave
Apache License 2.0
51 stars 8 forks source link

The mystery of strings #11

Closed MrBrax closed 7 years ago

MrBrax commented 7 years ago

Yet another "thread", but this applies to more than one file format. So there's definitely something weird about strings, which causes trouble when i try to parse different games (most likely different director versions):

Game 1

49 46 57 56 9F 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00
00 00 00 00 00 00 00 15 00 00 00 40 00 00 00 57 00 00 00 59 00 00 00 59 00 00 00 59 00 00 00 59 13 4D 61 67 6E 75 73 20 4C 75 6E 64 65 6E
20 2D 20 45 4C 44 00 29 4D 61 67 6E 75 73 20 4C 75 6E 64 65 6E 20 2D 20 45 4C 44 20 49 6E 74 65 72 61 6B 74 69 76 20 50 72 6F 64 75 6B 74
69 6F 6E 00 15 43 3A 5C 64 6F 6B 5C 4D 75 6C 6C 65 5C 4D 76 4E 6F 50 72 6F 74 00 00 00 00

1499931721396

Game 2

49 46 57 56 5F 00 00 00 00 00 00 14 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 0C 00 00 00 1F
00 00 00 2F 00 00 00 31 0B 4C 69 6E 75 73 20 2D 20 47 43 44 11 4C 69 6E 75 73 20 46 65 6C 64 74 20 2D 20 47 43 44 00 0E 43 3A 5C 53 65 6E
70 65 74 74 5C 64 69 72 00 00 00 00

1499931738682

Game 1 is consistent and always has a null byte after strings, with the length preceding it all.

Game 2 however, doesn't seem to have any logic behind it as far as I've gotten, sometimes it has a null byte, sometimes it doesn't, regardless of where it's used. Even/odd text lengths with padding doesn't match up either.

I'm still not very used to binary stuff, is this a common technique with a really simple function to it?

tomysshadow commented 7 years ago

It's not uncommon to add extra null bytes after a string if the string has an odd number as a length so that everything afterwards still lines up nicely in a hex editor. However, I don't think that's what is going on here. It seems to sometimes not be occurring with odd numbers, unless I just can't count. Are there any other samples?

Brian151 commented 7 years ago

Different versions make sense, but it seems trivial/pointless to change that. Going from ASCII/ANSI to Unicode (somewhat off-topic: also, some Lscr seem to use Unicode, seems to happen when JS syntax is selected) is one thing, but screwing-around with where/how lengths are encoded, and the presence or lack of a null terminator... IDK...

There has to be, no matter how random it seems, SOME method to this, or a collection of methods.

@tomysshadow beat me! dang...

tomysshadow commented 7 years ago

Also: are we certain there isn't supposed to be another string there and it's just empty?

Brian151 commented 7 years ago

This seems possible, which section is VWFI, again?

the end of these also is most interesting, they seem to always have a couple of 00's, for some reason...

tomysshadow commented 7 years ago

That's what I'm thinking: this looks like a section where there is publisher information, and maybe they didn't fill certain fields out, most often the four fields at the end. Can't remember off the top of my head what this section does or if that's even known.

Brian151 commented 7 years ago

I was going to say, it has that kind of stench to it (not meant in any bad way, ofc)

iirc, many of the sections are. The last one I even tried to understand was VWLB, and I couldn't figure-out a whole lot other than string encoding...

How much weed were these guys smoking?! This format still is full of inconsistencies, even if it does have a pattern. It's most universal pattern is to not strictly adhere to any one pattern more than once.

so, VW (whatever that means?) "File Info" ?

tomysshadow commented 7 years ago

According to Schockabsorber, anything beginning with VW has to do with the timeline. So I don't know what these strings are doing here... I guess VW is supposed to stand for "viewer"?

Brian151 commented 7 years ago

I don't think now it's 100% timeline VWFI is an outlier to that. I popped an example file, same deal It it's timeline, it's referring to some related external file, a temp one, at the very least

also, have some speculation: VW = "Video Window"? "Virtual Window"?

LB = "L[a]B[el]" TL = "T[ime]L[ine]" FI = "F[ile]I[nfo]" ? SC = ??? TC = ???

MrBrax commented 7 years ago

Wow, everyone was awake!

VWFileInfo is what i can gather, yes. Created by, Modified by, File path.

I'll get some more examples.

tomysshadow commented 7 years ago

SC, according to Schockabsorber, is the Score and contains timing information

Brian151 commented 7 years ago

VW SC[ore] ALRIGHT!

we can identify their purposes, at least, and assign meaningful names

MrBrax commented 7 years ago

49 46 57 56 72 00 00 00 00 00 00 14 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00 00 00 00 13 00 00 00 26 00 00 00 36 00 00 00 38 00 00 00 38 00 00 00 38 00 00 00 38 11 4C 69 6E 75 73 20 46 65 6C 64 74 20 2D 20 47 43 44 00 11 4C 69 6E 75 73 20 46 65 6C 64 74 20 2D 20 47 43 44 00 0E 54 3A 5C 44 69 73 74 72 5C 4D 65 64 69 61 00 00 00

49 46 57 56 5D 00 00 00 00 00 00 14 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 0A 00 00 00 1D 00 00 00 2D 00 00 00 2F 09 4C 69 6D 70 61 6E 20 2D 20 11 4C 69 6E 75 73 20 46 65 6C 64 74 20 2D 20 47 43 44 00 0E 54 3A 5C 44 69 73 74 72 5C 4D 65 64 69 61 00 00 01 00

49 46 57 56 6E 00 00 00 00 00 00 14 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00 00 00 00 0F 00 00 00 22 00 00 00 32 00 00 00 34 00 00 00 34 00 00 00 34 00 00 00 34 0D 4C 69 6E 75 73 20 78 20 2D 20 47 43 44 00 11 4C 69 6E 75 73 20 46 65 6C 64 74 20 2D 20 47 43 44 00 0E 43 3A 5C 53 65 6E 70 65 74 74 5C 64 69 72 00 00 00

These are all from Game 2.

Also: are we certain there isn't supposed to be another string there and it's just empty?

Not from what i can tell, comparing the entries. There are only three text fields there, and it doesn't make sense with it not having a null byte in either case.

tomysshadow commented 7 years ago

Do we know where in Director these fields are actually set/can be viewed?

Brian151 commented 7 years ago

i agree, that'd be useful

MrBrax commented 7 years ago

1499934453670_t

(the name is set in preferences->general, and when saving the file, those in ifvw get overwritten)

The reason why i'm doing this in the first place is because i just can't figure out where the movie dimensions are saved, but i know they're not saved in the "640x480" format, but instead their common rect thing: 1499934639072 Same as in bitmaps. But back to strings!

Brian151 commented 7 years ago

there any place the .dir can be obtained? even if temporary... that's enough information to try some sniffing, but i'd need whole file...

IIRC, one of the timeline/stage sections is supposed to have it, though?

Speaking of which, one of the sections refers to what cast members and scripts are placed/linked to the stage... @tomysshadow You remember which one that is? I know shockabsorber mentions it

MrBrax commented 7 years ago

Should we form a Discord server or something where we can chat about this, and share examples/code maybe? It's kinda difficult here on GH.

Brian151 commented 7 years ago

Could... I have a chat server running, I think it's called AjaxChat, but I won't share that publicly...

and yes, it is kinda hard. I have a discord account... Discord is kinda resource-intensive, though

MrBrax commented 7 years ago

You can run it in the browser though, less overhead there.

Brian151 commented 7 years ago

I have hardly been able to do that, even. I did run it in browser...

There is maybe one way I could accommodate... but it really isn't an option always, and it's violating certain agreements I made with family...

On this computer, I can barely run fatfox aynymore. It's gotten so bad gmail no longer can connect hangouts

tomysshadow commented 7 years ago

Discord is pretty lightweight especially in comparison to Skype at least.

I'm looking right now at the spot in code that reads/writes VWFI... can't say that I get it though, it looks more complex on Windows than on Mac

MrBrax commented 7 years ago

@Brian151 well this is gonna get difficult if you can't run these things, is the situation that bad? :/

@tomysshadow oh, maybe it's because they used win/mac yeah? differing encoding methods?

Brian151 commented 7 years ago

@tomysshadow Skype's on my s*** list... I really need to get back on and try and find other means of talking with my contacts there.

@MrBrax I did so on my own computer, which has more RAM, for sure (no idea of CPU)... we could give it a shot, but it's very likely to neuter my ability to do much else.

tomysshadow commented 7 years ago

I don't see where it's actually writing the strings on Mac, just the numbers at the beginning. It's always using big-endian though. The first number in the chunk is four bytes long and seems to be a flag where one of the bits specifies whether to use shorts or longs for the rest of the chunk.

Brian151 commented 7 years ago

perhaps that's why the code on windows is more complex... IDK anything about MAC's endianness, but Windows is generally locked-into little-endian cuz Intel processors, at least from my understanding.

tomysshadow commented 7 years ago

The Mac version is easier to work with, it has symbols... Yeah the actual name of the functions are imStreamReadMotorollaINT32 or imStreamReadIntelINT16 and etc. but I'm still not sure what it uses to write strings.

Brian151 commented 7 years ago

alright then strings are... fun... I prefer high-level programming with them a lot more than reading/writing them from binary forms or at low levels... There's just too many ways to do it....

tomysshadow commented 7 years ago

Well, I believe IML32 has a function to do that specific task which makes it easier to see where that's happening.

Brian151 commented 7 years ago

alright...

so, I've got this stub to add to projector notes, thoughts?

SHOCKWAVE FORMAT (timeline???):
All sections here seem to pertain to the timeline
Their FourCC ID's All begin with "VW" (meaning yet unknown)

Section VWLB {
    // "L[a]B[el]"
    // frame labels
}

Section VWTL {
    // "T[ime]L[ine]"
    // movie timeline
}

Section VWFI {
    // "F[ile]I[nfo]"
    // some kind of file-related information
}

Section VWSC {
    // "SC[ore]"
    // timing information
}

Section VWTC {
    // ???
    // ???
}

@MrBrax you've gone silent, suddenly...

MrBrax commented 7 years ago

I don't know how much you guys have searched around, but I've found quite a bit of attempts here and there, like this one: https://github.com/radishengine/drowsy/blob/master/mac/filetypes/open_MV93.js

I'm skimming around trying to find something that makes sense, because i did see something about the even/odd padding but can't remember where it was.

@Brian151 well it's not really a chatroom

VWCF contains width/height/default palette/bgcolour

Brian151 commented 7 years ago

I really have had hard times finding many attempts... Most were by accident

I've searched for it quite a few times. I've also searched and searched for SWF/XFL/FLA docu, basically nothing... (and what little there is is split all across stackoverflow,github, and various blogs, some even only still exists because of the web archive) Even formats like PNG or RIFF it's hard to find a good piece of docu. Oh, and don't read ECMA'S specs for JSON!

So, could you create a discord server?

Do you have accounts elsewhere?

MrBrax commented 7 years ago

https://discord.gg/KqgzrY

I have accounts on tons of places, it's just finding a good application.

Brian151 commented 7 years ago

gotta chnage PW, already forgot it... gmail lag-fest, here i come... T-T

moikeygraham commented 7 years ago

what about IRC as a means to chat? i've been following progress (good progress so far!) and feel i could be of use helping out.

tried to reverse the DCR format a while back for a game (Habbo) and made some progress (i was only ever interested in the Lingo script bytecode at the time).

MrBrax commented 7 years ago

irc is decent, but old and doesn't really support collaboration features

moikeygraham commented 7 years ago

@MrBrax would recommend slack, but that's a pretty heavy app to use too.

Brian151 commented 7 years ago

Back on intended topic (i may go through and clear some of the other stuff later) Here's another string oddity : Daisy-chained.

Such is the format of VWLB, which I have cracked, finally... well, mostly

Section VWLB {
    // "L[a]B[el]"
    // frame labels
    Uint16 [big] labelsCount
    Uint16Array(labelsCount * 2) labelEntries {
        Uint16 [big] frameNumber // ?
        Uint16 [big] labelTextOffset
    }
    Uint32 [big] labelTextLength
    String(labelTextLength) labelsText // without offsets, this would be impossible to read
}

I read the labels table in reverse, actually... here's my study, which I may or may not publish

42 4C 57 56 78 00 00 00 
00 0C // labels count
00 03 // huh
00 00 // offset of "ready"
00 05 // huh ?
00 05 // offset of "title"
00 06 // huh ?
00 0A // offset of "start"
00 08 // huh ?
00 0F // offset of "menu"
00 0B // huh ?
00 13 // offset of "pickupSpybot"
00 0E // huh ?
00 1D // offset of "map"
00 11 // huh ?
00 20 // offset of "entry"
00 15 // huh ?
00 25 // offset of "play"
00 1A // huh ?
00 29 // offset of "credits"
00 21 // huh ? 
00 30 // offset of "tutorial"
00 2F // huh ?
00 38 // offset of "kill1"
00 33 // huh?
00 3D //offset of "kill2"
00 00 00 42 //final data length

//data (ASCII)
readytitlestartm
enupickSpybotmap
entryplaycredits
tutorialkill1kil
l2

I need to each lunch like, now...

MrBrax commented 7 years ago

yeah it's the same as with casts which i'll write more about tomorrow, lots of formats use it, it seems

Brian151 commented 7 years ago

CASt has been observed to do this, also?! which field(s)?

Brian151 commented 7 years ago

unless anyone would like to contest, I think I'm going to close this...

MrBrax commented 7 years ago

It was before I realized it used the offsets table yes

On 23 July 2017 07:35:23 CEST, Brian notifications@github.com wrote:

unless anyone would like to contest, I think I'm going to close this...

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/Brian151/OpenShockwave/issues/11#issuecomment-317230492