codescalersinternships / home

home repo for internships
1 stars 0 forks source link

Concurrent File Duplicate Finder #156

Open xmonader opened 4 days ago

xmonader commented 4 days ago

Create a command-line tool that concurrently searches for duplicate files in a given directory tree using file hashes (e.g., MD5).

Requirements:

  1. File Scanning:

    • Recursively walk through a specified directory tree
    • Calculate MD5 hash for each file
    • Handle large files efficiently (read in chunks)
  2. Concurrency:

    • Use goroutines to process multiple files simultaneously
    • Handle SIGTERM and SIGINT
    • Properly cancel all of the goroutines when the app is exiting
  3. Duplicate Detection:

    • Store file hashes and paths in a concurrent-safe data structure
    • Identify files with matching hashes as duplicates
  4. Output:

    • Display groups of duplicate files
    • Provide JSON output too
    • Show file paths and sizes
    • Optionally, provide a summary (total space that could be saved)
  5. User Interface (CLI):

    • Accept directory path as a command-line argument
    • Provide flags for optional features (e.g., minimum file size to consider)
  6. Testing:

    • Write unit tests for core functions (e.g., hash calculation, duplicate detection)
    • Implement integration tests with a sample directory structure
  7. Documentation:

    • Provide clear comments explaining concurrency patterns used
    • Include a README with usage instructions and performance considerations

Optional Enhancements:

Acceptance Criteria: