X-Hax / sa_tools

Sonic Adventure Toolset
http://info.sonicretro.org/SA_Tools
110 stars 24 forks source link

Use streams instead of byte arrays #11

Open MainMemory opened 6 years ago

MainMemory commented 6 years ago

This will require rewriting literally everything that reads/writes binary files, but it should cut down on memory usage and be much nicer to look at.

MainMemory commented 5 years ago

Well according to @Sewer56 using streams is bad, so I guess we won't do this.

Sewer56 commented 5 years ago

Well, in most cases for parsing files I would actually advise to use streams as they read to much easier to understand lead to cleaner code.

What I do advise against however is combining FileStream and BinaryReader. This is because it leads to terrible inefficiency, especially for reads of small values due to considerable CPU overhead.

The reason for this is in the .NET base class library, every single layer under the final stream has its own error checking: The BinaryReader, FileStream, Buffer.BlockCopy(), Buffer.BlockCopyInternal() ... it nests, not nicely. For reading a single integer in this configuration from a stream, last I checked you get around 21 branches/if statements for a single read (many of which are duplicates), not good.

The solution to this is to roll your own BinaryReader with support for buffering the data in the underlying streams to avoid all of these unnecessary checks. For example, my BinaryReader operating over a FileStream supports generics and is 6.7-19.7 times as fast depending on the workload. Most of that performance advantage is simply buffering.

Random relevant article on the web: https://jacksondunstan.com/articles/3568

MainMemory commented 5 years ago

I mean, I'd need to make my own BinaryReader/BinaryWriter anyway, because of big endian stuff.

Sewer56 commented 5 years ago

I mean, I'd need to make my own BinaryReader/BinaryWriter anyway, because of big endian stuff.

Mine supports Big Endian already. Only catch with mine is it has no overloads returning value, only out parameters.

On my end it's deliberate as it prevents a potential copy from a method's stack frame to the caller's stack frame. With out parameter the returned value is guaranteed written directly to stack frame of caller.

Never made a BinaryWriter though /shrug. Maybe I should but for now I just use a List<T> and the Struct / StructArray utility classes (specifically GetBytes()) from my library.


Note: In case of my implementation, if reading structs in Big Endian, implement IEndianReversible in struct.

/// <summary>
/// An individual file entry in the <see cref="MotionPackage"/> archive.
/// </summary>
public unsafe struct AnimationEntry : IEndianReversible
{
    public int FileNamePtr;
    public int FileDataPtr;
    public int PropertyTuplePtr;

    public void SwapEndian()
    {
        Endian.Reverse(ref FileNamePtr);
        Endian.Reverse(ref FileDataPtr);
        Endian.Reverse(ref PropertyTuplePtr);
    }
}

Reading example: https://github.com/Sewer56/ShadowMTPSharp/blob/9587e197fa3c9b04bd3413b3506ac4190703e936/MTPLib/MotionPackage.cs#L110

Otherwise read primitives in Big Endian (reading structs is faster and takes less code though).