TexTools / xivModdingFramework

GNU General Public License v3.0
9 stars 17 forks source link

Consider abandoning sqpack in a new TTMP version #56

Closed anna-is-cute closed 4 months ago

anna-is-cute commented 1 year ago

Hi! I'm your friendly neighbourhood developer who works on tools that can use TTMPs (namely Heliosphere)!

I am curious as to what led to the adoption of using sqpack internally for modpacks and if it can be removed in a new TTMP version. In terms of pros and cons, I can only think of cons, so I'd love to learn more about why it's in use.

However, as far as I understand, even for modifying the game files, TexTools can take tex files, models, etc. and directly modify the game files without going through an intermediary sqpack first. This makes it appear to me that there is actually no reason to continue storing modpacks in sqpack! Again, if I'm wrong, please let me know.

However, for anyone who makes tools working with modpacks, sqpack is an absolute curse for them. There's no single point of truth, no spec, and every implementation of the format will be slightly different and cause incompatibilities. This creates a high barrier to entry for any tools wanting to work on modpacks, as they must first have/create a sqpack decoder! In addition, TexTools does several things wrong when writing sqpacks (which I can't blame whoever implemented - it's a tricky and annoying format), which leads to even a "correct" reader having to make itself work differently to accommodate.

For users, sqpacks generated by TexTools have poor compression (DEFLATE on 16kb discrete chunks - also bad (de)compression time, which leads to poorer UX) and exact duplicates of files (making the size problem worse). They're also completely opaque; the user can't see (or edit) what's actually in the modpack without TexTools.

It would be nice if TTMP3 could be a normal ZIP file or like tar.zst or anything standard. Take a look at something like what Penumbra uses (PMP). It just has the files in folders and has a JSON file pointing to them and which game path they affect. This is much more approachable for developers and users. It should also be easier for TexTools, since it can completely drop the use of sqpack, have fewer bugs with modpacks, and get better compression. It could even deduplicate files (although TexTools could be doing that with sqpack anyway) and get a better compression ratio. If TTMP3 used something like a ZStandard-compressed tar file, it could achieve great compression with really fast decompression, improving UX!

To someone like me who didn't make TexTools, I can't see any reason that it uses sqpack, considering that alternatives like Penumbra don't need it at all. However, there could be things I don't know, so I'd love to have a conversation if that's the case. Either way, working to remove sqpack from modpacks could help everyone in the modding scene.

Squall-Leonhart commented 1 year ago

Short Answer: No.

Long Answer: When running the import functions for the mod resources themselves, the Tool runs the procedures for creating the files in the format that the game expects and requires that they be in, this is a processor intensive task.

When this imported data is turned into a modpack, this format conversion is preserved within the TTMP file so that on next import, only the bare minimum of data changing is performed so that the bare minimum of processing is required. This improves import times as there is no recreation of data, it is imported in the ready to go state, into a new ttmp file and the index pointed to find it

For users, sqpacks generated by TexTools have poor compression (DEFLATE on 16kb discrete chunks - also bad (de)compression time.

It is still faster to open a 2GB advanced modpack on Textools, than it is on a 2GB wizard enabled Nexus mod.

which leads to poorer UX) and exact duplicates of files (making the size problem worse).

Expected when you understand data fragmentation, also already mitigated by the Reclaim Unused Space feature.

They're also completely opaque; the user can't see (or edit) what's actually in the modpack without TexTools.

An issue no different to BSA files, I see no compelling argument here.

It would be nice if TTMP3 could be a normal ZIP file or like tar.zst or anything standard. Take a look at something like what Penumbra uses (PMP). It just has the files in folders and has a JSON file pointing to them and which game path they affect. This is much more approachable for developers and users. It should also be easier for TexTools, since it can completely drop the use of sqpack, have fewer bugs with modpacks, and get better compression. It could even deduplicate files (although TexTools could be doing that with sqpack anyway) and get a better compression ratio. If TTMP3 used something like a ZStandard-compressed tar file, it could achieve great compression with really fast decompression, improving UX!

A comparison to what penumbra does or how it does it is fundamentally irrelevant, penumbra is not managing the mod data in a way that is attempting to be transparent to the native resource loading mechanisms of the game. It patches its own way in and superscedes the games native mechanism, thus it can do what it likes.

In the process of doing what it likes, it also renders the game less stable at a fundamental level, this is not up for argument, anyone in the field of developing code injection tools understands this is a fact, Penumbra is required to load penumbra mods, at all times, not just in the installation and management of this data.

Textools isn't going to become a game plugin, sorry.

anna-is-cute commented 1 year ago

Hey, thanks for your thorough answer!

Basically, what I get from your answer is that you have seen the import times increase when the work to encode the files into sqpack for the game isn't precomputed. It can't be converting DDS to Square's jank texture format, since that's just a difference in header. I'm someone who has written tools to unpack and repack TTMPs, so I have some knowledge in this area! The repacker I've written processes files pretty quickly, and faster than I remember TexTools operating, though I haven't used it in ages, so I can't really say anymore.

What I can say is that I think these fears are mostly unfounded when you optimise the conversion. Extracting every file from a ~1.6 GB TTMP and recreating the same TTMP (deduplicated), for example, takes me about 1.5 minutes, so about 45 seconds for each separate operation. Is an install of a 1.6 GB modpack taking 45 seconds so outrageous to the point of using sqpack, which makes interoperability a hassle for absolutely everyone?

If 45 seconds isn't considered a deal-breaker, perhaps some refactoring of the logic in TexTools to stop operating on inefficient, duplicated in-memory byte arrays and instead start working on streams would help optimise the process! I'd be happy to investigate that in future.

Expected when you understand data fragmentation, also already mitigated by the Reclaim Unused Space feature.

I'm not really sure what you mean by this, though I can tell you that I have encountered a total of zero TTMPs (outside of very tiny mods that touch one or two files) that are deduplicated, and the vast majority of them see huge file size reductions when they are. If you have a feature that does this, perhaps enable it by default!

An issue no different to BSA files, I see no compelling argument here.

Anyone who wants to grab a specific file has to open TexTools. Any tool that wants to work on a mod needs to handle broken sqpack first. A quick patch for a tex file or a model has to go through TT. It would just be a nice thing to have. Not to mention getting to run a compression algorithm on a whole file instead of just tiny chunks!

In the process of doing what it likes, it also renders the game less stable at a fundamental level

C'est la vie in modding. I don't really want to go into this, as it's off topic, though I find it amusing the amount of corruption and crashes seen by users who don't start over before every single client patch. Pot calling the kettle black and all that.

Textools isn't going to become a game plugin, sorry.

Ultimately, this is not what I was asking, and I apologise if it seemed that way. I just know myself and several people that would appreciate a more open modpack format that was easier to approach, and I still see that as a benefit for everyone in the community. I don't think anyone wants TexTools to be a plugin.

Squall-Leonhart commented 1 year ago

What I can say is that I think these fears are mostly unfounded when you optimise the conversion. Extracting every file from a ~1.6 GB TTMP and recreating the same TTMP (deduplicated), for example, takes me about 1.5 minutes, so about 45 seconds for each separate operation. Is an install of a 1.6 GB modpack taking 45 seconds so outrageous to the point of using sqpack, which makes interoperability a hassle for absolutely everyone?

Amount of files plays a huge part with the bigger packs, and efficiency with importing, lets say a user creates a new model and textures for every single arm/hand item in the game. this could be as much as 500 arm items changed, and each one has its own specular, normal and colourset, without the ttmpd being in the sqpack format, the importer will read the data referencing the changed files and them write them into the new Sqpack n a serial fashion one by one. Basically you're doing the equivalent of add to zip where the zip writer is fed a list of files to read in and it does it one by one, performing a new IO request for the next content to add. In its current form, it can do 4GB's totalling 3800 files in that 30-45 seconds, depending on the PC in use, using a json to reference seperate component data would increase this exponentially along the IO capabilities of the system.

This also has the potential for hitting OOM and heavy pageout because of the windows file system cache, not to mention both compression and decompression memory requirements do not routinely match final content size.

The issue taken with mentioning PMP here is that doesn't care about this because its not writing out a data file that the game natively reads directly, when its time for the game to show these things, it just gets told to load the correlating resources from disk using content substitution.

If 45 seconds isn't considered a deal-breaker, perhaps some refactoring of the logic in TexTools to stop operating on inefficient, duplicated in-memory byte arrays and instead start working on streams would help optimise the process! I'd be happy to investigate that in future.

I thought it was as of 2.3.2.5, though i might be confusing the dat writer with the index constructor, there are always gains to be had, somewhere though.

I'm not really sure what you mean by this, though I can tell you that I have encountered a total of zero TTMPs (outside of very tiny mods that touch one or two files) that are deduplicated, and the vast majority of them see huge file size reductions when they are. If you have a feature that does this, perhaps enable it by default!

by deduplication, i assume you mean the removal of Vanilla items that were never needed within the package? such as including an unmodified mtrl file in a package that only should be writing texture data? - these are purely on the author of the pack, some authors, such as Koneko go out of their way to omit the vanilla jank to only include the modified files.

If not, and you actually mean to drop data that is referenced to the item but copied to the sqpack when modifications are performed, this goes into dangerous territory, the data is retained because it might be needed at any point in time that SE decides to change how they load the file resources. There is no guarantee a package trimmed of "extra" data is going to work down the track, We already see this with game roms and disk images when using patch files, heck a bunch of pokemon players had to redump their games some time ago because their trimmed dumps are not compatible with DSi boot.

C'est la vie in modding. I don't really want to go into this, as it's off topic, though I find it amusing the amount of corruption and crashes seen by users who don't start over before every single client patch. Pot calling the kettle black and all that.

Usually a user error scenario. Everything tends to work fine across updates apart from the UI changing stuff, once in a blue moon theres a more invasive modder that does raw imports and deletes something that was referenced by another mod they haven't removed that they hadn't noticed was broken until the game updates and goes through the patch migration routine.

In other cases, and I'm calling out both Sophie and Kylie here, the author persists in using old versions of ttmp files as the basis for their advanced pack, which were made from a buggy version of legacy Textools, in this case for some reason textools will break the game on performing the migration steps unless the user has run through Check for Problems and successfully corrected wrong file sizes from these packs prior to updating (Tight n Firm may or may not still be affected by this too).

anna-is-cute commented 1 year ago

without the ttmpd being in the sqpack format, the importer will read the data referencing the changed files and them write them into the new Sqpack n a serial fashion one by one

I'll address this at the end.

using a json to reference seperate component data would increase this exponentially along the IO capabilities of the system.

This is what MPL already does. You can just store the path of the file in the ZIP along with the data already stored in the MPL.

This also has the potential for hitting OOM and heavy pageout because of the windows file system cache, not to mention both compression and decompression memory requirements do not routinely match final content size.

The issue taken with mentioning PMP here is that doesn't care about this because its not writing out a data file that the game natively reads directly, when its time for the game to show these things, it just gets told to load the correlating resources from disk using content substitution.

If you stream files out, you're not gonna hit OOM. Especially not if you use something like zstd. This is not how compression algorithms work. Creators of these algorithms understand writing efficient software and design them with memory limitations in mind. You can extract a 4GB zstd-compressed file on an embedded system with tiny amounts of RAM.

I thought it was as of 2.3.2.5, though i might be confusing the dat writer with the index constructor, there are always gains to be had, somewhere though.

Not the last time I looked at the sqpack code, but it could have been!

by deduplication, i assume you mean the removal of Vanilla items that were never needed within the package? such as including an unmodified mtrl file in a package that only should be writing texture data? - these are purely on the author of the pack, some authors, such as Koneko go out of their way to omit the vanilla jank to only include the modified files.

If not, and you actually mean to drop data that is referenced to the item but copied to the sqpack when modifications are performed, this goes into dangerous territory, the data is retained because it might be needed at any point in time that SE decides to change how they load the file resources. There is no guarantee a package trimmed of "extra" data is going to work down the track, We already see this with game roms and disk images when using patch files, heck a bunch of pokemon players had to redump their games some time ago because their trimmed dumps are not compatible with DSi boot.

No, this is not what I mean. TTMPs contain the same file multiple times in the sqpack. If someone makes three options that all contain the same file, it appears three times in the sqpack. The MPL references the same file at three different offsets. You could simply include those files one time instead, then copy it to multiple places in the game if necessary. This is a huge space savings. I know because I've made tools that do just that. Multiple entries in the MPL referencing the same offset.


I am asking that TexTools use a standard archive format to contain the files it needs to operate.

It seems like your main concerns are speed and memory usage, and I promise you that those concerns are unfounded.

The worst part of creating a sqpack, in terms of speed, is the DEFLATE compression (I've profiled the process), and you can massively speed that up by multithreading it, since sqpack operates on discrete 16kb chunks. It honestly does not take that long to do this, and it gets even faster if you use DEFLATE level 6 instead of 9 (very comparable compression, way faster). Again, I have created a threaded, streaming sqpack encoder, and it's fast.

I've already discussed how you can use streaming to constrain memory on the modpack extraction (and creation) side, so that should be taken care of, too.

I'm not suggesting you do anything because of Penumbra, I just suggested something that it does (use a zip file with json manifest and normal files), which TexTools could do, because it's more efficient and more approachable.

I'm simply asking on behalf of the community: help make it easier for people to make cool things.

helpful-fox-senko-san commented 4 months ago

Since this was mentioned again -- TexTools did make improvements to compression/decompression speed in the branch that is being developed for Dawntrail, which makes this proposal more viable from a performance standpoint: https://github.com/TexTools/xivModdingFramework/commit/3d48024897e2954837573afdd519079d9552e038 https://github.com/TexTools/xivModdingFramework/commit/2be455e8a5f6726784646f77f6d823027d0e2328

And related: https://github.com/TexTools/xivModdingFramework/pull/66 will implement de-duplication in TTMP files.

anna-is-cute commented 4 months ago

That sounds like good progress! Do all these changes set the stage for distributing mod packs as archives of loose files instead of a big sqpack hunk?

Lunaretic commented 4 months ago

It is unlikely the core TTMP format is going to change, as that would involve additional work across the community due to breaking changes, which cannot be reliably assumed to 'just be done' in a timely fashion (or possibly at all). Furthermore it would slow down TexTools modpack import appreciably with large modpacks, even with parallelization of the deflate tasks. In short, it's asking to introduce breaking changes for very minimal reasons, when all tooling would still have to retain back-compatibility with the existing format given the corpus of existing TTMP2s in the wild.

TTMP2 will remain as-is with the additional De-duplication of files, (which is 100% backwards compatible). There is a non-zero possibility that TT gets .PMP export, but at this time I see no compelling reason to alter the TTMP file format in the requested manner.

Exter-N commented 4 months ago

TT getting .PMP export is irrelevant, as Penumbra already has TTMP2 import.

.PMP import, though (and also, Raw File Import that works well with models), would make the life of more than one user easier.

Lunaretic commented 4 months ago

PMP Import and full Smart-File Import (including all types of Raw files both compressed and uncompressed) is already in progress/mostly complete and will be out with Dawntrail.

NotNite commented 4 months ago

TT getting .PMP export is irrelevant, as Penumbra already has TTMP2 import.

I think it's worth pointing out here that there are two problems solved by a newer TTMP format: both authoring and parsing. This solves the authoring problem, given that the PMP format is leagues easier to write, but you're excluding wanting to actually parse TTMPs by saying this.

To my knowledge, the only parsing code that exists for TTMPs outside of XMF is Penumbra and ttmp-rs. To quote the original message, There's no single point of truth, no spec, and every implementation of the format will be slightly different and cause incompatibilities.

In short, it's asking to introduce breaking changes for very minimal reasons, when all tooling would still have to retain back-compatibility with the existing format given the corpus of existing TTMP2s in the wild.

I find describing this issue as "very minimal" to be underselling it. Many developers want to interact with the TTMP format, myself included, and that's why this post has reactions from plenty of people. I understand your concerns for backwards compatibility, but I question if this could be solved by allowing users to export as TTMP2 for whatever tools don't support a theoretical TTMP3.

There is a non-zero possibility that TT gets .PMP export

If this theoretically happens, assuming TT has PMP import, it would likely be best to offer it as the default format for export, given it is significantly easier to work with.

Lunaretic commented 4 months ago

PMP as default export is more likely than altering the existing TTMP format.

PMP already handles the requests you're making, and rather is functionally identical to the format you're requesting, without introducing breaking changes to existing tools.

If/When PMP Export proves sufficiently stable and meets TT's requirements for the advanced mod import screen, (Namely, Pagination or mod groups and, preview images.) it is reasonable to switch to it as default export.

anna-is-cute commented 4 months ago

For anyone getting linked to this issue, this was further discussed in Discord, and since TexTools will be getting PMP import and export, that renders the idea of TTMP3 unnecessary. I only suggested this because I did not know there would be upcoming PMP support.