Dooders / Pyology

A metaphorical model of a biological cell
MIT License
0 stars 0 forks source link

Implement Log Hashing for Simulation Consistency Verification #19

Open csmangum opened 3 hours ago

csmangum commented 3 hours ago

Issue: Implement Log Hashing for Simulation Consistency Verification

Description:

We need a robust mechanism to verify the consistency of simulation results between runs. By generating a hash of the simulation logs (excluding time-related information), we can quickly detect any changes in the simulation output. This will help in identifying unintended alterations due to code changes, environment differences, or data corruption.

Current Situation:

Objective:

Purpose:

Acceptance Criteria:

  1. Log Processing:

    • Implement a method to strip time-related information from the simulation logs.
    • Ensure that the processed logs retain all essential data for accurate hashing.
  2. Deterministic Hashing:

    • The hashing function must produce the same hash for identical simulation outputs.
    • Any change in the simulation output (excluding time info) should result in a different hash.
  3. Internal Database for Hash Storage:

    • Implement a secure and efficient internal database to store hashes and associated metadata.
    • The database should support quick retrieval and comparison of hashes.
  4. Automated Hash Comparison:

    • The system should automatically generate and compare hashes after each simulation run.
    • If the new hash differs from the most recent stored hash, the system should flag the change.
  5. User Notification:

    • Provide clear notifications or logs indicating whether the simulation output has changed based on hash comparison.
    • Include details to help users investigate differences when a change is detected.
  6. Performance Impact:

    • The hashing and comparison process should not significantly impact the simulation's performance.
  7. Security and Integrity:

    • Ensure that the hashing process does not expose sensitive information.
    • Protect the internal database from unauthorized access or corruption.

Required Tests:

  1. Consistency Test:

    • Objective: Verify that identical simulation runs produce the same hash.
    • Procedure:
      • Run the simulation with a fixed set of parameters.
      • Remove time-related information and generate a hash of the logs.
      • Repeat the simulation under the same conditions.
      • Compare the newly generated hash with the previous one.
    • Expected Result: The hashes should match, confirming consistent simulation output.
  2. Change Detection Test:

    • Objective: Ensure that any change in simulation output results in a different hash.
    • Procedure:
      • Modify a simulation parameter or code to alter the output.
      • Run the simulation and generate a new hash.
      • Compare this hash with the hash from the unchanged simulation.
    • Expected Result: The hashes should differ, indicating a change in the simulation output.
  3. Time Information Exclusion Test:

    • Objective: Confirm that time-related information is effectively removed before hashing.
    • Procedure:
      • Run the simulation, generate logs, and record the hash.
      • Wait for a period or adjust the system clock.
      • Run the simulation again under the same conditions.
      • Generate and compare the new hash with the previous one.
    • Expected Result: Hashes should match despite differences in timestamps, proving time information is excluded.
  4. Hash Storage and Retrieval Test:

    • Objective: Validate that hashes are correctly stored and can be retrieved from the internal database.
    • Procedure:
      • After a simulation run, check that the hash and metadata are stored.
      • Attempt to retrieve the stored hash.
    • Expected Result: The stored hash matches the one generated, and metadata is accurate.
  5. Automated Comparison and Notification Test:

    • Objective: Ensure the system automatically compares hashes and notifies users of differences.
    • Procedure:
      • Run simulations with and without changes in output.
      • Observe whether the system flags changes and notifies appropriately.
    • Expected Result:
      • For identical outputs, the system confirms consistency.
      • For different outputs, the system alerts the user to the change.
  6. Performance Impact Test:

    • Objective: Confirm that hashing does not degrade simulation performance.
    • Procedure:
      • Measure the simulation execution time with and without the hashing process.
    • Expected Result: The addition of hashing should have negligible impact on performance.
  7. Data Integrity Test:

    • Objective: Ensure the internal database maintains the integrity of stored hashes.
    • Procedure:
      • Simulate multiple runs, storing hashes each time.
      • Verify that all stored hashes remain unchanged and retrievable.
    • Expected Result: Hashes and metadata are securely stored without corruption.
  8. Security Test:

    • Objective: Ensure that the hashing process and database do not expose sensitive information.
    • Procedure:
      • Review the hashing method to confirm no sensitive data is included in the hash.
      • Test database access controls and encryption.
    • Expected Result: Sensitive information is protected, and access to stored hashes is secure.

Implementation Notes:

csmangum commented 3 hours ago

Can even make an embedding model to take the digitized logs and turn into input data for the auto encoder. That way some type of semantic context can be stored for comparison and grouping simulation results