TotalWar-Modding / tw_pack_lib

1 stars 0 forks source link

Saving a PackFile from memory corrupts it. #3

Open Frodo45127 opened 6 years ago

Frodo45127 commented 6 years ago

From doing some testing in RPFM with the patch to use the lib, I found some problems while saving. When hitting save, and then reopening the PackFile, the resulting PackFile somehow loses his last PackedFile, and the rest are unopenable. Checking it with a Hex Editor shows that:

After checking the "save" functions, I also have some questions:

Trolldemorted commented 6 years ago

From doing some testing in RPFM with the patch to use the lib, I found some problems while saving. When hitting save, and then reopening the PackFile, the resulting PackFile somehow loses his last PackedFile, and the rest are unopenable.

Nice catch! I forgot that pfh5 index entries without the big header bit set (aka every warhammer2 pack, I assume) have an extra 0 in the index when calculating the index size, thus the index size was incorrect, so the last item didn't fit in it. Fixed it in 0.1.16, can you test if the problem persists?

Why is creating an empty PackFile blocked?

I assumed creating empty packs doesn't make sense since they don't contain anything, but if you'd like to allow that I can of course create an empty index.

If I understand it correctly, when saving, for each PackedFile, the lib reads the data from the source PackFile, then write it in the destination PackFile. How is that supposed to work if you are just updating a PackFile's data (a normal "Save")? For what I see, it tries to read data from a file that has been partially modified, which can be the cause of some of the corruption problems from above.

The caller has to ensure all packed files are memory-backed if the source file is to be overwritten. My rpfm pr enforces this by first ensuring all packed files are memory-backed before truncating the file with File::create:

    pub fn save(&mut self, pack_file_path: &PathBuf) -> Result<()> {

        // ensure every packed file has the correct timestamp and path and is loaded into memory before we (potentially) overwrite the file
        let mut packed_files = Vec::with_capacity(self.packed_files.len());
        for packed_file_view in &mut self.packed_files {
            packed_file_view.packed_file.get_data()?; 
            packed_file_view.packed_file.timestamp = Some(self.timestamp);
            packed_file_view.packed_file.path = packed_file_view.path.join("\\");
            packed_files.push(&packed_file_view.packed_file)
        }

        // We try to create the file
        let mut file = File::create(pack_file_path)?;
        tw_pack_lib::build_pack_from_memory(&packed_files, &mut file, self.pfh_version, self.bitmask, self.pfh_file_type, self.timestamp).map_err(From::from)
    }
Frodo45127 commented 6 years ago

Fixed it in 0.1.16, can you test if the problem persists?

Just tested, and it seems to be fixed.

I assumed creating empty packs doesn't make sense since they don't contain anything, but if you'd like to allow that I can of course create an empty index.

It's useful. For example, someone may want to delete everything in their PackFile and then save it to use later, or someone may want to upload an empty PackFile to the workshop to reserve a name (though with the last "every packfile in content" policy, this has lost his sense). It has his uses. Not many, but it has them.

The caller has to ensure all packed files are memory-backed if the source file is to be overwritten. My rpfm pr enforces this by first ensuring all packed files are memory-backed before truncating the file with File::create:

And why is that in the pr and not in the lib? That's how the lib should behave by default, is not exclusive to RPFM. Otherwise, someone trying to use the lib to edit PackFiles can easely corrupt them if it doesn't know it has to load everything to memory before saving.

Also, I'll like to add that the timestamp still gets deleted and the PackFile index gets wiped out. Those don't corrupt the PackFiles perse (well, in some situations the latter one can), but can interfere with features that uses them.

Trolldemorted commented 6 years ago

It's useful. For example, someone may want to delete everything in their PackFile and then save it to use later, or someone may want to upload an empty PackFile to the workshop to reserve a name (though with the last "every packfile in content" policy, this has lost his sense). It has his uses. Not many, but it has them.

Fair enough, let's support it

And why is that in the pr and not in the lib? That's how the lib should behave by default, is not exclusive to RPFM. Otherwise, someone trying to use the lib to edit PackFiles can easely corrupt them if it doesn't know it has to load everything to memory before saving.

Would adding a disable_lazy_loading: bool param to parse_pack be ok?

Also, I'll like to add that the timestamp still gets deleted and the PackFile index gets wiped out. Those don't corrupt the PackFiles perse (well, in some situations the latter one can), but can interfere with features that uses them.

Will look into the timestamp issue, but I won't get it done today. pf index is on the todo, but I'd rather fix all bugs first

Trolldemorted commented 6 years ago

tw_pack_lib 0.1.18 is out, parse_pack now has a load_lazy param, the timestamp for pfh4 headers is fixed and you should be able to create empty pack files. I hope I manage to investigate the performance issues with dependency packs on the weekend.

Frodo45127 commented 6 years ago

Would adding a disable_lazy_loading: bool param to parse_pack be ok?

Mmmm.... and what part of "Saving corrupted my PackFile" does that fix? The problem here is not that the lazyloading is enabled or not. The problem is that saving with lazyloading over the open PackFile can corrupt it if you don't implement a way to get the data before saving it. So, if you always have to implement it, why is not implemented by default in the lib? I mean, that bool it's useful, but it doesn't fix the problem.

Will look into the timestamp issue, but I won't get it done today. pf index is on the todo, but I'd rather fix all bugs first

Sure, np. Btw, if you want to test packfiles with the stuff in the PackFile Index, in RPFM open a PackFile, select it in the treeview and hit Ctrl+M. It'll open the "dependency manager" where you can add/remove PackFiles from the PackFile index. Don't remember if I already told you this... Too many channels to talk is making hard to keep track.

the timestamp for pfh4 headers

Mmmm why for PFH4 only if all PFH4 and PFH5 Packfiles have that in the header (though arena seems to have it set as 0)? That part is common to all PFH4 and 5 PackFiles.

Trolldemorted commented 6 years ago

Mmmm.... and what part of "Saving corrupted my PackFile" does that fix? The problem here is not that the lazyloading is enabled or not. The problem is that saving with lazyloading over the open PackFile can corrupt it if you don't implement a way to get the data before saving it. So, if you always have to implement it, why is not implemented by default in the lib? I mean, that bool it's useful, but it doesn't fix the problem.

With a bool you can disable lazyloading if you don't want it (for the currently opened packfile) and enable it for everything else - I have updated my PR to disable lazy-loading for the "real" opened pack file and kept it on everywhere else.

Timestamps should work everywhere now, and I think they were only buggy for pfh4 (?)