douglasrubims / docuforge

A powerful CLI for automatic software project documentation generation, compatible with multiple languages and technologies. Simplify and keep your documentation up-to-date with ease.
MIT License
2 stars 1 forks source link

Refactor metadata storage to use hashes instead of dates #9

Closed douglasrubims closed 1 month ago

douglasrubims commented 1 month ago

Refactor Metadata Storage to Use Hashes Instead of Dates

Current Behavior:

At present, Docuforge stores metadata about each documented file in a metadata.json file, which includes two properties:

Example of the current format:

{
  "src/infra/jobs/bull/jobs/backup.ts": {
    "lastDocumented": 1727136400462,
    "lastModified": 1727135896603.5308
  }
}

Problem:

Using timestamps (lastModified and lastDocumented) leads to issues where trivial actions, such as saving a file in an editor (e.g., pressing Command + S in VSCode without any actual changes), update the lastModified timestamp. This causes unnecessary regeneration of documentation, which can be inefficient for larger projects.

Proposed Solution:

To avoid unnecessary documentation generation and make the metadata system more accurate, we propose replacing both lastModified and lastDocumented timestamps with hashes of the file content. By comparing these hashes, we can determine if the content has truly changed, reducing false positives for documentation regeneration.

New Metadata Format:

The new metadata.json structure will store two hashes:

Example of the updated format:

{
  "src/infra/jobs/bull/jobs/backup.ts": {
    "documentedHash": "5d41402abc4b2a76b9719d911017c592",
    "currentHash": "7d793037a0760186574b0282f2f435e7"
  }
}

Benefits:

Implementation Steps:

  1. Replace the use of lastModified with currentHash in the metadata.json file.
  2. Replace the use of lastDocumented with documentedHash in the metadata.json file.
  3. Implement a hash generation function (e.g., using SHA-256 or MD5) to compute the content-based hashes for each file.
  4. Modify the comparison logic to use the hashes (documentedHash vs currentHash) instead of timestamps to decide when to regenerate documentation.
  5. Ensure backward compatibility for older metadata formats, if necessary.

This change will enhance the precision and efficiency of Docuforge's documentation generation process, especially for users working with large or frequently edited codebases.