Xor-el / HashLib4Pascal

Hashing for Modern Object Pascal
MIT License
220 stars 80 forks source link

Misuse of PasMP Library in Parallel Loops #29

Open BeRo1985 opened 2 hours ago

BeRo1985 commented 2 hours ago

Issue

I've noticed that this project misuses the PasMP library (which I am the author of) in parallel loop handling, which could lead to incorrect behavior in multithreaded operations. Specifically, the parallel loop wrappers are only processing the starting index (AFromIndex) without iterating through the full range (AFromIndex to AToIndex). Here are the corrections:

Corrections

  1. In TBlake2BP.PasMPParallelComputationWrapper:

    Current:

    procedure TBlake2BP.PasMPParallelComputationWrapper(const AJob: PPasMPJob;
      const AThreadIndex: LongInt; const ADataContainer: Pointer;
      const AFromIndex, AToIndex: TPasMPNativeInt);
    begin
      ParallelComputation(AFromIndex, ADataContainer);
    end;

    Corrected:

    procedure TBlake2BP.PasMPParallelComputationWrapper(const AJob: PPasMPJob;
      const AThreadIndex: LongInt; const ADataContainer: Pointer;
      const AFromIndex, AToIndex: TPasMPNativeInt);
    var Index: TPasMPNativeInt;
    begin
      for Index := AFromIndex to AToIndex do
        ParallelComputation(Index, ADataContainer);
    end;
  2. In TPBKDF_ScryptNotBuildInAdapter.PasMPSMixWrapper:

    Current:

    class procedure TPBKDF_ScryptNotBuildInAdapter.PasMPSMixWrapper
      (const AJob: PPasMPJob; const AThreadIndex: LongInt;
      const ADataContainer: Pointer; const AFromIndex, AToIndex: TPasMPNativeInt);
    begin
      SMix(AFromIndex, ADataContainer);
    end;

    Corrected:

    class procedure TPBKDF_ScryptNotBuildInAdapter.PasMPSMixWrapper
      (const AJob: PPasMPJob; const AThreadIndex: LongInt;
      const ADataContainer: Pointer; const AFromIndex, AToIndex: TPasMPNativeInt);
    var Index: TPasMPNativeInt;
    begin
      for Index := AFromIndex to AToIndex do
        SMix(Index, ADataContainer);
    end;
  3. In TPBKDF_Argon2NotBuildInAdapter.PasMPFillMemoryBlocksWrapper:

    Current:

    procedure TPBKDF_Argon2NotBuildInAdapter.PasMPFillMemoryBlocksWrapper
      (const AJob: PPasMPJob; const AThreadIndex: LongInt;
      const ADataContainer: Pointer; const AFromIndex, AToIndex: TPasMPNativeInt);
    begin
      PDataContainer(ADataContainer)^.Position.FLane := AFromIndex;
      FillMemoryBlocks(AFromIndex, ADataContainer);
    end;

    Corrected:

    procedure TPBKDF_Argon2NotBuildInAdapter.PasMPFillMemoryBlocksWrapper
      (const AJob: PPasMPJob; const AThreadIndex: LongInt;
      const ADataContainer: Pointer; const AFromIndex, AToIndex: TPasMPNativeInt);
    var Index: TPasMPNativeInt;
    begin
      for Index := AFromIndex to AToIndex do
      begin
        PDataContainer(ADataContainer)^.Position.FLane := Index;
        FillMemoryBlocks(Index, ADataContainer);
      end;  
    end;

Summary

These changes ensure that all indices within the given range are processed correctly, maintaining the intended parallelization behavior.

Xor-el commented 1 hour ago

@BeRo1985 thanks for proposing these changes. can you please create a PR containing these changes so the commit history contains your contributions?

Xor-el commented 1 hour ago

@BeRo1985 please take a look at here

I used GlobalPasMP.ParallelFor to handle the parallel processing and this worked as expected back then. has something changed recently?

BeRo1985 commented 50 minutes ago

Nothing in PasMP has changed regarding this behavior; it has worked this way from the beginning. If the code worked for you previously, that would be due to luck đŸ™‚ , influenced by factors such as your specific hardware configuration, CPU load, and other runtime conditions. However, this usage does not guarantee correct parallel behavior across different environments, and it could easily lead to issues on other systems or under different conditions.

And you only have to call TPasMP.CreateGlobalInstance; once, for example somewhere within the unit initialization ... end. block, or somewhere where it is only executed once. Otherwise, only unnecessary CPU cycles would be wasted.

Xor-el commented 34 minutes ago

Thanks @BeRo1985 for the explanation. Will look into it.