MTG / mtg-jamendo-dataset

Metadata, scripts and baselines for the MTG-Jamendo dataset
Apache License 2.0
274 stars 38 forks source link

fix: Memory error during checksum validation #47

Open annabeth97c opened 8 months ago

annabeth97c commented 8 months ago

Issue:

This pull request addresses a MemoryError issue encountered when computing the SHA256 checksum for validation of each downloaded tar

Changes:

Modified the compute_sha256 function to read files in chunks rather than loading them entirely into memory.

Verified the fix, the script is able to run without a crash

Included screenshots demonstrating the issue before the fix and the successful execution after the fix. Added system memory usage information (free -h) to provide context for a system where the issue should be replicable

Screenshots:

Screenshot 1: Error encountered when processing a download

Screenshot 2024-03-23 at 3 25 58 PM

Screenshot 2: Successful execution after implementing chunked reading

Screenshot 2024-03-23 at 3 39 29 PM

System Memory Usage

Screenshot 2024-03-23 at 4 12 17 PM