Closed aleksey-hoffman closed 4 years ago
Are you sure that your SSD reads consistently with 500MB/s? Also it can happen that a process (like an antivirus) is interfering with the disk I/O performance. Measure reading that file with Node.js without running any hash functions.
I just made a test on my computer with a 4GB file. (NVMe SSD + Node.js v12.16.1 + Windows 10.0.18362 + i7-7700K CPU)
3540ms
using hash-wasm with xxhash64 algorithm. (~1157 MB/s)
7842ms
using Nodejs crypto module with md5 algorithm. (~522.44 MB/s)
My source code:
const { createXXHash64 } = require('hash-wasm');
const fs = require('fs');
const crypto = require('crypto');
async function getFileHash (path) {
// Method 1: Nodejs crypto module md5
// return new Promise(resolve => {
// const hash = crypto.createHash('md5');
// fs.createReadStream(path)
// .on('data', data => hash.update(data))
// .on('end', () => resolve(hash.digest('hex')))
// })
// Method 2: hash-wasm xxhash64
const xxhash64 = await createXXHash64()
return new Promise((resolve, reject) => {
xxhash64.init()
fs.createReadStream(path)
.on('data', data => xxhash64.update(data))
.on('end', () => resolve(xxhash64.digest('hex')))
})
}
async function run() {
console.time('TIME | hash')
console.log(await getFileHash('../file'));
console.timeEnd('TIME | hash')
}
run();
@Daninet I've done some more testing on Ubuntu and on Windows 10 with turned off "Windows defender real-time protection" and it seems like I/O interference from the system was rarely playing a role. Mostly, the speed difference between hashing functions was noticable when hashing big files located on an SSD. I didn't see any difference between crypto md5
and hash-wasm xxhash64
for small image files and files located on an HDD, though. It seems like the read stream speed might be the limiting factor here.
What I still don't understand is why the results are so similar for small files the files located on an HDD. If totalTime (ms) = streamRead (ms) + hashing (ms)
why was I consistently getting similar times for both hashing functions if one algorithm is faster then the other and streamRead
is supposed to take the same amount of time since it's the same file on the same HDD drive. Perhaps, for some files the hashing speed is the same for both algorithms or it's just limited by the readStream somehow.
I'm gonna close the issue since I'm not sure where to go from here.
The hashing speed should be a constant, regardless of the file contents.
On modern computers, it can be considered that the CPU works in parallel with the disk I/O. So the time can be approximated like this: totalTime = hashInitTime (1-5ms) + max(streamReadTime, hashCalculationTime)
Hello, thank you for creating this module. I'm using it in an Electron app.
I'm trying to figure out why
hash-wasm
is not performing as expected, it's quite slow at hashing files.I hashed a 2.5GB
.zip
file and got unexpected results - hash-wasm withxxhash64
algorithm performed almost exactly the same as the Nodejs crypto module withmd5
algorithm :25s
using hash-wasm withxxhash64
algorithm.26s
using Nodejs crypto module withmd5
algorithm.And then I also hashed a 300MB zip archive located on an SSD (500MB/s reads) just to make sure the drive is not the botleneck:
1500ms
using hash-wasm withxxhash64
algorithm.1500ms
using hash-wasm withmd5
algorithm.1200ms
using Nodejs crypto module withmd5
algorithm.I also hashed a 5MB image and got similar results (located on an SSD as well):
90ms
using hash-wasm withxxhash64
algorithm.85ms
using Nodejs crypto module withmd5
algorithm.Do you know what might be causing the problem? Drive speed? Node's streams? I don't get why the results are so similar. Isn't
hash-wasm xxhash64
supposed to be like 5 times faster especially with big files?I tried changing the
highWaterMark
option to read data in8MB
chunks and maximize drive usage, thinking that file stream might be the bottleneck here, but it didn't help in this situation, if anything, the time went up from25s
to27s
:(I tried changing this option since it helped in another unrelated case, when I used
readStream().pipe(writeStream)
)Electron is not the problem here since I'm seeing the same results when I run the code from a terminal (node v13.5.0).