Closed Maruhl closed 1 year ago
I don't understand what you are trying to do. Could you give me an example using existing API and I can help you simplifying it.
I have a file in a special format and contains several compressed data. This format contains various also metadata.
In this metadata it says what kind of compression, where it can be found, how big it is and how big it is decompressed.
So I have a FileStream, and take these compressed data chunks out by stream as well (via a stream wrapper). And these I then want to simply decode using LZ4Codec.Decode(stream).
LZ4Stream.Decode(..) does not work at this place, because no LZ4 header etc. exists.
I still don't understand how the thing you are describing works.
But I can do some guesses:
LZ4Stream
won't helpLZ4Codec.Decode(...)
(Codec
, not Stream
)Thanks for hints.
At the moment I am doing this as well:
As I mentioned before, I have no idea about compression algorithms.
Is there a special reason why only arrays can be used with LZ4Codec and no stream is supported?
var decompressed = new byte[decompressedSize];
depends on size, but most of the time ArrayPool<byte>
would be better
LZ4Codec.Decode(source.ReadAllToArray(), decompressed.AsSpan());
No idea what source.ReallAllToArray()
does, I assume it is reading one chunk? Not really All
? Right? At the same time I guess ReadAllToArray()
seems to allocate array on every call, so you immediately put pressure on GC (and slow things down).
using MemoryStream decompressedMemoryStream = new(decompressed);
decompressedMemoryStream.CopyTo(destination);
This looks weird. Why create a memory stream while you could just write to destination
?
destination.Write(decompressed, 0, decompressedSize);
Is there a special reason why only arrays can be used with LZ4Codec and no stream is supported?
Because it is simple things to do. Reading and writing arrays from and to streams is trivial. Convert.ToBase64String(...)
does not support streams either, and focuses on things it does best: converting byte[]
to string
, nothing more. Nobody asks "why I can't just Convert.ToBase64(Stream, Stream)
" though.
ReadAllToArray() exists only so that I can use LZ4Codec.Decode.
source.ReadAllToArray() - use already ArrayPool
The decompression code I first kind of cobbled together to compile the old code from Net 4.5 framework days. At least the tests are green. Before only byte[] was used.
My problem is that allocating memory is costly, at least in my case, since all I have is a stream and it also ends up in a FileStream. If I call the stuff 100k times the performance is not so good, ArrayPool will come good here - but is also not the solution. Since I don't know how big the data is, the memory might not be enough either. And not to mention multithreading.
Ok I thank you. Then it stays like this for now.
ReadAllToArray
implementation is bad, actually very bad and prone to corruption.
You rent an array (Rent
), load data into it (Read
), return a slice of it (AsSpan.Slice
), and the immediately return array to pool (Return
) so it can be rented by other thread (or not even that, just next Rent
would return it). This means two threads would assume they exclusively own it but in fact they would both read/write from/to the same array.
My problem is that allocating memory is costly
The whole thing can be implemented with ArrayPool (although not the way you did it) so with "zero" memory pressure (not really zero, but "amortized zero" over long time).
Then it stays like this for now.
Oh, it should not. Current implementation of ReadAllToArray
is plain wrong and dangerous (ie: it can start saving garbage into destination file).
Yes you are right.
I fixed that after posting this image. After Slice().ToArray() is in invoked.
I believe the ArrayPool
But thanks for the hint.
*But if I could pass a stream right away, I wouldn't have to allocate anything. :P
The whole thing can be implemented with ArrayPool (although not the way you did it) so with "zero" memory pressure (not really zero, but "amortized zero" over long time).
You mean that i rent a minimum size = compressed.Lengh + uncompressed size, to have one array, that contains initial the compressed data and later at end of array the uncompressed data?
After Slice().ToArray() is in invoked.
This is allocation, unnecessary if you care about performance.
I believe the ArrayPool.Shared is ThreadSafe,
Yes. It is.
but the resulting allocated area of course is not.
Exactly.
*But if I could pass a stream right away, I wouldn't have to allocate anything. :P
There is some truth to it. If someone else writes all your code you don't need to write any.
Not sure how ChunkReaderStreamWrapper
really works, but assuming it is a pretend-stream over fragment of a real stream, I would say implementation is trivial:
void CopyLz4Block(Stream source, Stream target, int chunkLength, int decompressedSize)
{
var sourceBuffer = ArrayPool<byte>.Shared.Rent(chunkLength);
var targetBuffer = ArrayPool<byte>.Shared.Rent(decompressedSize);
try
{
source.Read(sourceBuffer, 0, chunkLength);
LZ4Codec.Decode(sourceBuffer, targetBuffer);
target.Write(targetBuffer, 0, decompressedSize);
}
finally
{
ArrayPool<byte>.Shared.Return(sourceBuffer);
ArrayPool<byte>.Shared.Return(targetBuffer);
}
}
No allocations, and the logic is just 3 lines: Read
, Decode
, Write
You mean that i rent a minimum size = compressed.Lengh + uncompressed size, to have one array, that contains initial the compressed data and later at end of array the uncompressed data?
Renting one array with space for both uncompressed and compressed data is actually not a bad idea, but not what I meant.
By "not the way you did it" I just meant the you were using ArrayPool
wrong by renting it and returning it before finishing processing.
I fixed that after posting this image. After Slice().ToArray() is in invoked.
Sticking .ToArray()
on it works but negates all the benefit of pool as .ToArray()
is memory allocation, so you rent first to immediately allocate anyway, making renting completely useless.
If someone else writes all your code you don't need to write any.
🎉
Not sure how ChunkReaderStreamWrapper really works, but assuming it is a pretend-stream over fragment of a real stream...
Yes, it supports only access to the real chunk data of the stream - it knows the position and the size of the compressed data.
By "not the way you did it" I just meant the you were using ArrayPool wrong by renting it and returning it before finishing processing.
Sticking .ToArray() on it works but negates all the benefit of pool as .ToArray() is memory allocation, so you rent to immediately allocate anyway, making renting completely useless.
I am right there with you
Your solution is good - "I can't see the forest for the trees" (I mean, I have too many code issues in the solution to think clearly and I can't see the simplest solutions).
Is there a change to add a method in LZ4Codec with streams?
I don't want to allocate extra memory where the source and destination is always a stream.