Closed domenkozar closed 8 months ago
Results look pretty impressive! Thanks for working on this. I will take a better look later.
We will have to manage the migration from FilePath to OsPath in the released APIs. Maybe deprecate the old ones in 0.11 and gradually move to OsPath instead of managing two versions?
The user
and system
time of ListDir
is higher, and is higher than real
. Looks like we are comparing concurrent version of ListDir with find
which is most likely serial. If we compare the serial version I guessfind
is still much better, which means there is scope for improvement.
To assess the impact of this change we can compare the serial version of ListDir
without and with this change.
For comparing, I tried this serial impl of ListDir on old and new code:
main :: IO ()
main = do
hSetBuffering stdout LineBuffering
Stream.fold (Fold.drainMapM print)
$ Stream.unfoldIterateDfs unfoldOne
$ Stream.fromPure (Left ".")
where
unfoldOne = Unfold.either Dir.eitherReaderPaths Unfold.nil
Old:
$ time ListDir . > /dev/null
real 0m0.943s
user 0m0.798s
sys 0m0.384s
New:
$ time ListDir . > /dev/null
real 0m0.643s
user 0m0.404s
sys 0m0.303s
$ time find . > /dev/null
real 0m0.210s
user 0m0.096s
sys 0m0.112s
This is pretty good improvement just by replacing FilePath with OsPath, but if we have to compete with find we need to read the attributes of the files along with the names in one go. Our goal should be to bring the serial impl time below find and then concurrent would automatically become better.
On Linux we get the d_type
in the dirent returned by readdir. We can use that to avoid the expensive resolving the path and re-reading of the inode that we are using to find if the file type is a dir.
I will hack up a quick readdir implementation that provides the Dir/File status in the same call and check.
Added a commit on top of this in #2642 . Now I am getting:
$ time ListDir . > /dev/null
real 0m0.463s
user 0m0.301s
sys 0m0.208s
Improved from 643ms to 463ms. But still there is scope for improvement. Need to try removing the overhead of bracketIO and a few other small optimizations.
@domenkozar I have added you to the repo, so that we can push commits on the same PR.
We were spending a lot of time in the show instance in ListDir. I buffered and printed the OsPath byte-arrays. find
and fd
also use buffered IO instead of line oriented IO which takes more time.
-- Stream.fold (Fold.drainMapM print)
Stream.fold (Handle.writeChunksWith 32000 stdout)
$ fmap osPathToArray
$ fmap (either id id)
...
osPathToArray p =
let SBS barr# = getPosixString $ getOsString p
mbarr = MutByteArray (unsafeCoerce# barr#)
in Array mbarr 0 (I# (sizeofByteArray# barr#)) :: Array Word8
Now we are same as the rust fd (concurrent version comparison).
$ time fd -u >/dev/null
real 0m0.153s
user 0m0.118s
sys 0m0.162s
$ time ListDir . > /dev/null
real 0m0.154s
user 0m0.115s
sys 0m0.102s
Our CPU time is lower but actual time is same as fd
. This means we are actually better, perhaps there is some IO ordering issue. But fd
may also be doing a few more things, so it may be more or less the same. Also, there may still be a few small optimizations to do.
ListDir changes: https://github.com/composewell/streamly-examples/pull/66
I made some more changes, we are now significantly faster than rust fd:
$ time fd -u >/dev/null
real 0m0.147s
user 0m0.152s
sys 0m0.124s
$ time ListDir . > /dev/null
real 0m0.085s
user 0m0.063s
sys 0m0.094s
You can try out the changes from https://github.com/composewell/streamly/pull/2642 and the example program from https://github.com/composewell/streamly-examples/pull/66 .
I have to do some cleanup and complete the change before committing. We can have it in streamly-core 0.2.2 coming soon.
Nice!
Fixes #2168 for Dir, File is yet to be done.
I've hacked this together mostly to get some benchmarks if it addresses #2200 and it does!
Note that you have to pass
-L
tofind
so that it resolves symlinks to get the same behavior.On a directory with about 855k items: