ETA on File Hashes - Githubissues

ghost commented 12 years ago

Would be nice to have some form of ETA / Progress Bar during hashing of new shares.

conicalflask commented 12 years ago

I can't see this happening any time soon. Listing all the files in a directory takes nearly as long as the hashing itself for big shares and would require a two-stage initial build of the filelist.

It does show you how big the share is and how many files there are updating as it builds, this provides feedback that it's still working.

I'm not going to consider fixing this until there's absolutely nothing left for me to do as it's not a clearly good thing to do.

ghost commented 12 years ago

Running a separate thread in parallel which does 'look ahead' to try to determine the number of files should give the user some good feedback for larger shares. For smaller shares is probably won't matter. Will take this and see what I can do,

ghost commented 12 years ago

Changes made on branch issue15_FileHashETA. Will push to GitHub and pull request once I have a net connection.

conicalflask commented 12 years ago

I've had a look through your code (not run it yet though) and there's a couple of issues: 1) FileCounter is a runnable not a thread. When you create it an execute run() it will run to completion rather than executing in the background simultaneously with the build. 2) When a share is being refreshed (rather than built) you already have a pretty good idea of how big it is/how many files there are. There's no need to do a filecounter on anything but the initial build of the share. (you can detect this from the status of the share. BUILDING is first time REFRESHING is every other time.

ghost commented 12 years ago

Yeah this hasn't been pushed yet. I've fixed the dumb threading problem as I spotted it already. No need to investigate my code until I do the pull request :-) What is online ATM is far from finished.

I'll take another look at the Refresh/Build differences, it's possible I've misinterpreted a bit of code.

Also in my brief testing of this, I've found the file counter is orders of magnitude faster than the hasher - able to count up my whole music collection (8000 files) in a split second. So unless this varies from OS to OS I can't imagine trying to get the performance perfect here is actually worth spending a lot of time on. But will look over everything once more before making the pull req.

On 12 Aug 2012, at 22:55, conicalflask notifications@github.com wrote:

I've had a look through your code (not run it yet though) and there's a couple of issues: 1) FileCounter is a runnable not a thread. When you create it an execute run() it will run to completion rather than executing in the background simultaneously with the build. 2) When a share is being refreshed (rather than built) you already have a pretty good idea of how big it is/how many files there are. There's no need to do a filecounter on anything but the initial build of the share. (you can detect this from the status of the share. BUILDING is first time REFRESHING is every other time.

— Reply to this email directly or view it on GitHub.

conicalflask commented 12 years ago

Good to hear it performs well :)

In the case of refreshing a share the refresher already knows what to expect so it only hashes things that have changed name, size or last-modified date. It'll be interesting to compare performance of both but I think for second and subsequent refreshes they should take about the same amount of time where the number of files changed is few.

There may be issues with thrashing of non SSD HDDs, but sensible (windows? who knows) oses cache filesystem metadata aggressively so this shouldn't affect real world performance too drastically either way.

ghost commented 12 years ago

Merged, deleted branch on github.

mt-inside commented 12 years ago

How's the Windows performance? It's a well-known result that open() on Windows is hella-slow. This won't be called to read meta-data, but I don't know what part of that process is the slow bit.

conicalflask commented 12 years ago

I guess this should impact performance at all on multicores as the bottleneck will be reading filesystem metadata (which should hopefully be cached by the OS) But on windows all bets are off :)

ghost commented 12 years ago

Well it's merged into the client now - only one way to find out! If it turns out to be a problem, we just need to raise a new issue

conicalflask / fs2

ETA on File Hashes #15