Open mukunku opened 4 months ago
I was able to tidy up the PR. However there is a bug that happens when running dotnet test
which is breaking the PR checks. I was able to track it down to the following error although I have no clue why it's happening:
The active test run was aborted. Reason: Test host process crashed : Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at System.MemoryExtensions.AsSpan[[System.Int32, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](Int32[], Int32)
at Parquet.File.PackedColumn.AllocateOrGetDictionaryIndexes(Int32)
at Parquet.File.DataColumnReader.ReadColumn(System.Span`1<Byte>, Parquet.Meta.Encoding, Int64, Int32, Parquet.File.PackedColumn)
at Parquet.File.DataColumnReader+<ReadDataPageV1Async>d__15.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Parquet.File.DataColumnReader+<ReadDataPageV1Async>d__15, Parquet, Version=1.0.0.0, Culture=neutral, PublicKeyToken=d380b3dee6d01926]](<ReadDataPageV1Async>d__15 ByRef)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[Parquet.File.DataColumnReader+<ReadDataPageV1Async>d__15, Parquet, Version=1.0.0.0, Culture=neutral, PublicKeyToken=d380b3dee6d01926]](<ReadDataPageV1Async>d__15 ByRef)
at Parquet.File.DataColumnReader.ReadDataPageV1Async(Parquet.Meta.PageHeader, Parquet.File.PackedColumn)
at Parquet.File.DataColumnReader+<ReadAsync>d__10.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Parquet.File.DataColumnReader+<ReadAsync>d__10, Parquet, Version=1.0.0.0, Culture=neutral, PublicKeyToken=d380b3dee6d01926]](<ReadAsync>d__10 ByRef)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Start[[Parquet.File.DataColumnReader+<ReadAsync>d__10, Parquet, Version=1.0.0.0, Culture=neutral, PublicKeyToken=d380b3dee6d01926]](<ReadAsync>d__10 ByRef)
at Parquet.File.DataColumnReader.ReadAsync(System.Threading.CancellationToken)
at Parquet.ParquetRowGroupReader.ReadColumnAsync(Parquet.Schema.DataField, System.Threading.CancellationToken)
at Parquet.Test.ParquetReaderOnTestFilesTest+<DecryptFile_UTF8_AesGcmV1_192bit>d__2.MoveNext()
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Parquet.Test.ParquetReaderOnTestFilesTest+<DecryptFile_UTF8_AesGcmV1_192bit>d__2, Parquet.Test, Version=1.0.0.0, Culture=neutral, PublicKeyToken=d380b3dee6d01926]].ExecutionContextCallback(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Parquet.Test.ParquetReaderOnTestFilesTest+<DecryptFile_UTF8_AesGcmV1_192bit>d__2, Parquet.Test, Version=1.0.0.0, Culture=neutral, PublicKeyToken=d380b3dee6d01926]].MoveNext(System.Threading.Thread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Parquet.Test.ParquetReaderOnTestFilesTest+<DecryptFile_UTF8_AesGcmV1_192bit>d__2, Parquet.Test, Version=1.0.0.0, Culture=neutral, PublicKeyToken=d380b3dee6d01926]].MoveNext()
at Xunit.Sdk.AsyncTestSyncContext+<>c__DisplayClass7_0.<Post>b__1(System.Object)
at Xunit.Sdk.MaxConcurrencySyncContext.RunOnSyncContext(System.Threading.SendOrPostCallback, System.Object)
at Xunit.Sdk.MaxConcurrencySyncContext+<>c__DisplayClass11_0.<WorkerThreadProc>b__0(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at Xunit.Sdk.ExecutionContextHelper.Run(System.Object, System.Action`1<System.Object>)
at Xunit.Sdk.MaxConcurrencySyncContext.WorkerThreadProc()
at Xunit.Sdk.XunitWorkerThread+<>c.<QueueUserWorkItem>b__5_0(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
The active Test Run was aborted because the host process exited unexpectedly. Please inspect the call stack above, if available, to get more information about where the exception originated from.
The test running when the crash occurred:
Parquet.Test.ParquetReaderOnTestFilesTest.DecryptFile_UTF8_AesGcmV1_192bit
This test may, or may not be the source of the crash.
Okay, some findings.
If any test runs after my new file decryption test in the same xunit collection it crashes the CLR. I moved my test to its own test collection and disabled parallelization which essentially means xunit will run my test in isolation. see: https://github.com/aloneguid/parquet-dotnet/pull/480/commits/9e0bbbe06c2db11158627644162f20a49ddec1df
This way my test sometimes works; It randomly fails with similar memory mismanagement issues. So it's flaky at the moment. This is just a band-aid to get the PR green. I'm sure i'm doing something stupid somewhere that's causing this issue but I haven't been able to find it so far.
Summary
I made significant progress on getting Footer Decryption working with parquet files (#191).
I'm opening this work-in-progress pull request with hopes that some other folks can help get this across the finish line.
AES_GCM_V1
Thanks to a test file file @pzatschl shared with me I was able to implement the Aes Gcm V1 encryption algorithm.
link to code
AES_GCM_CTR_V1
I also implemented the Aes Gcm Ctr V1 encryption algorithm, however I don't have any test files to confirm it's working 🙃
link to code
How to test
Checkout the unit test I added that tests the sample file I mentioned above: link to code
However, even though I can decrypt the test file successfully, the data itself doesn't seem to be valid. So I had to add this try-catch as a temporary workaround. link to code We should remove this once we have a proper test file. (Unfortunately I don't have any other test files )