fsprojects / pulsar-client-dotnet

Apache Pulsar native client for .NET (C#/F#/VB)
MIT License
301 stars 47 forks source link

Possible bug decompressing Zstd with no size in the compressed payload. #257

Closed Nikolajls closed 6 months ago

Nikolajls commented 6 months ago

Hi !

Been a user of the library for some time now as well as other teams inside the company I work at use it as well.

We've discovered something strange where a topic can be read through by other Pulsar Clients, (Python, Java) but using this library with version 3.3 it fails

I've debugged through the codebase of the F# Client library connected to the topic and the issue we face is that the messages is compressed with Zstd where we get this exception.: "Decompressed content size is not specified"

With the current codebase for 3.3 it looks like this: The call to unwrap in ZstdNet is done using the default true value for bufferSizePrecheck so that library try to check the decompressed content size as part of the payload.

2024-03-27 15 24 39

Zstd library Unwrap method

However if I modify the F# code to use false and prevent the check from happening it works and we can read the message: 2024-03-27 15 26 19

According to what I've read on Zstd here 2024-03-27 14 50 33 It seems like the decompressed size is not required in the payload, so by always defaulting to true it fails to work on Zstd compressed payloads where the size is not present.

Would it be possible to add a way to configure if we want to always do this check so, it can be disabled by the user if they deem it neccesary?

I've no experience with F#, actually only ever read it by reading and learning this library - i've tried to look if I could make the changes myself, but it is not something that I'm entirely sure is feasible by me.

Lanayx commented 6 months ago

Hi, thanks for the thorough investigation. Adding this value to configuration will be not an easy task, since there is no decompressing configuration for Consumer at all. So I think you can just send a PR with an optional flag set to false. One thing that I'd like to ask - can you please add your case to the unit tests, so your change will stay covered.

Nikolajls commented 6 months ago

@Lanayx
Thank you, definitely went down a rabbithole investigating this :)

So create a pull request where it defaults to false instead (meaning all will be with false) - just to get the idea that you're suggesting?

And then do a unit test for it as well (which I will try to see how to do :) ) I im a bit unsure what branch to start of from, are you primarily working from develop ? :)

Lanayx commented 6 months ago

You can do it in proper way - write the failing test in one commit, create PR, see it fails, and then write the fix in another. PR should go to the develop branch.

As for unit test, just add one to the existing list of tests, or create another list, if the code appears very different from others:

[<Tests>]
let zstdSpecificTests =
    testList "zstdSpecific" [
        test "Fake test" {
            Expect.equal "" 1 2
        }
    ]
RobertIndie commented 6 months ago

We have already got the uncompressedSize from the message metadata: https://github.com/apache/pulsar/blob/631b13ad23d7e48c6e82d38f97c23d129062cb7c/pulsar-common/src/main/proto/PulsarApi.proto#L120 and allocated the target byte array here: https://github.com/fsprojects/pulsar-client-dotnet/blob/8df5b0e5dfbd6a7c19c6efd7b4038dccd0079bda/src/Pulsar.Client/Internal/Compression.fs#L126

I think we can just set the bufferSizePrecheck to false instead of exposing this configuration to the Consumer API. We did the same in the C++ client and the Java client.

Nikolajls commented 6 months ago

That makes good sense @RobertIndie and aligns with what @Lanayx said.

Just setting it to false is easy for me. Right now im struggling with trying to make a unit test for it because it requires getting something compressed that does not have the size in the payload, and thus that fails on the precheck. I've not found a way to compress something that does that (very very limited knowledge of compression)

All tests still pass when changing the precheck parameter to false (the current Zstd test) so not sure if it is needed ?

For more debugging i tried making a base64 string of one of the messages that fails and doing a test prior to any changes: Which fails with the exact same reason - so test proved 2024-03-28 08 33 10

When switching to false for precheck the test passes which is great. 2024-03-28 08 38 08

Lanayx commented 6 months ago

Yes, test is still needed, so if someone removes false flag in future, test will break and prevent doing that.

Nikolajls commented 6 months ago

I totally agree and am a firm believer in tests. However currently I cannot use the test case that i added since when decompressed and decoded contains data from my organisation that I don't think is allowed to go into a public repo.

After Easter holidays I will ask the team producing the events if they can publish a single event with no data besides the schema compressed with Zstd - hopefully that one will have the same error and can then possible be added as a testcase.

I had hoped a new version could be prepared with this false flag prior to that since it is breaking some applications and then have the test case added afterwards that will error out if someone removes the flag.

But if required then we'll have to wait till I've an redacted event that I can use as testcase.

Nikolajls commented 6 months ago

@Lanayx I've put up a PR for it now.

I managed to generate a compressed dataset using Zstd-Codec for Typescript (which is the library that the producer of events in our case is using) A test case has been added and the PR is here : https://github.com/fsprojects/pulsar-client-dotnet/pull/258

Lanayx commented 6 months ago

Merged and published 3.3.1

Nikolajls commented 6 months ago

@Lanayx

I can confirm after a update to 3.3.1 it works and Im able to read all messages in the topic in question. Thank you a lot for all the help and promptness in merging and pushing

I think I owe a beer possible ;) We can close this issue now