afritz1 / OpenTESArena

Open-source re-implementation of The Elder Scrolls: Arena.
MIT License
988 stars 68 forks source link

GLOBAL.BSA Layout #15

Closed afritz1 closed 8 years ago

afritz1 commented 8 years ago

WinArena is able to print out (among several other things) a list of all the files contained within GLOBAL.BSA, shown with their byte offset from the beginning as well as their size in bytes.

I removed the line numbers here for brevity, but the last entry (ZOMBIE6.CFA) is at index 2440, so there are 2441 items contained in GLOBAL.BSA using a variety of formats:

--- Offset ---- Size ---- Filename ----- 0x000002 0x02E1 01AXE.CFA 0x0002E3 0x14D7 01BARATT.CFA 0x0017BA 0x2AE0 01BARWLK.CFA 0x00429A 0x04AD 01BAXE.CFA ... 0xFF1273 0x2BA4 ZOMBIE3.CFA 0xFF3E17 0x3266 ZOMBIE4.CFA 0xFF707D 0x37CD ZOMBIE5.CFA 0xFFA84A 0x0D6C ZOMBIE6.CFA

Perhaps the virtual file system can use this information to load chunks of GLOBAL.BSA into the program. I can also put the actual list into a parse-able format at some point.

kcat commented 8 years ago

Yep, I have the VFS parsing GLOBAL.BSA and can pull individual entries from it, as if they were files on disk. :) That's where the XMI files are for the MIDI music, so it'll be part of the pull request for music once I get it working (which is close).

Ragora commented 8 years ago

You might just be able to mmap individual sections of the data files as well. At least for the act of any necessary parsing or conversion operations that would produce a result memory buffer from an input buffer anyway. Depends on how much of the original data is readily usable through modern API's, and whether or not its worth it to bind virtual address space to this data (even if its temporary as an alternative to reading the entire buffer in and doing whatever to produce a result) -- it would be pretty circumstantial.

afritz1 commented 8 years ago

Great. I'll be committing some small changes in a while, such as changing all <SDL2/SDL.h> includes to just "SDL.h" as that's what the CMake website suggests here (in the last paragraph).

mmap isn't compatible with Windows if I remember correctly.

Ragora commented 8 years ago

Windows has an equivalent to mmap: https://github.com/DraconicEnt/KGE/blob/develop/components/support/platform/windows/file.cpp#L19

That code doesn't quite work (I haven't gotten the chance to fix it, but that should generally be right), but Windows does support the act of memory mapping files: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366556(v=vs.85).aspx

In contrast: https://github.com/DraconicEnt/KGE/blob/develop/components/support/platform/unix/file.cpp#L19 (which does work)

I don't know if there's any restrictions on mapping multiple locations in a file, but I don't imagine there is any problems with mapping locations read only (as we don't want to write to the BSA's anyway).

I mention the is-it-worth-it because in plenty of cases it probably isn't much different than just reading the entire buffer into RAM and performing some operations on it, or even just reading portions at a time. It's just a thing to think about on a case by case basis.

kcat commented 8 years ago

I'm not too keen on mmap'ing it (through mmap or the Windows equivalent), personally. Although the individual files in the default BSA aren't very big, when mods can use their own BSAs it can potentially have pretty large files which would be wasteful to map into memory (e.g. FLACs, high quality video files).

Besides, the standard istream interface doesn't have a method to access the underlying memory map, so it'll still need to be read in through the read method anyway. I have a wrapper istream implementation that uses a normal ifstream on the real file, but constrains it to a portion of that file with a corrected offset (so 0 is the beginning of the given resource, and it won't read past the end of the resource).

Ragora commented 8 years ago

I wasn't talking about mapping the entire BSA. Once we know where everything is we can mmap individual portions. The initial reading for this data is probably just reading header data. At that point we can decide what we want to mmap on a per file basis. Stuff like audio files we'd probably have taking up memory anyway, at least initially until decoded unless it's a streamable format. As I mentioned in my initial post, it really depends on specific circumstance whether or not this is worth it versus just reading the data normally.

For memory mapping, you'd have to have a straight file handle either way, so I don't really see the istream issue being in the way (obtain a read handle on the BSA and mmap whatever designated by our location data).

You'd just have to write a wrapper around Linux/Windows memory mapping if it was something that could be useful akin to what I was doing in my links above.