aleksaan / diskusage

Duck is a very fast utility to find largest directories or files
MIT License
99 stars 12 forks source link

High memory consumption when more than 8M files count. #28

Closed jurajazz closed 4 years ago

jurajazz commented 4 years ago

Hi,

I'm using diskusage.exe for more than one year to monitor most consumming directories. It is very simple and handy tool. However while scanning of one my disk (with logs) containing: Overall info: Total time: 4h35m55.7915268s Total dirs: 499520 Total files: 8887953 Total links: 0 Total size: 10.45 Tb diskusage.exe (on Windows 64bit) consumes more than 3GB of RAM. It tooks also more than 4 hours.

Is there some way how to optimize at least - memory consumption or also the time needed?

Jurajazz

aleksaan commented 4 years ago

You're probably right. I have not done memory optimization yet.

jurajazz commented 4 years ago

I use '-depth 1', so in this case the program needs to memory only the first level while scanning. I think this should not require too much memory. If it would be freed while scanning (e.g. based on -depth level), it would definitly help to keep the memory low. I believe it can be solved easily.

For information - here is callstack after exhausting all free 4GB of RAM while scanning: diskusage-log-when-memory-exhausted.txt

All the best.

aleksaan commented 4 years ago

I think this should not require too much memory.

That isn't. Depth defines only what results will be printed, but it's full scan must be done to know size of the root directory.

jurajazz commented 4 years ago

Yes, full scan must be done, however when scan of one branch is finished, all nodes below '-depth x' could be forgotten (free), because they are not considered/used later in printing phase. This can save a big amount of memory while scanning.

Example of directory structure and printing with '- level 2': a

b

c

aleksaan commented 4 years ago

You are right! But it's only memory optimization, not time decreazing.

jurajazz commented 4 years ago
  1. Yes, my suggestion points only the memory issue, that block me using the diskusage from using it for by logs directories. Implementation of memory optimization would help me to use the diskusage for monitoring.

  2. For time optimization, you can use one of the nice feature of golang, the go function. E.g. scanning of each root directory can run in separate thread (go function) or for other branches, however it would probably require some limiting and synchronization. https://www.youtube.com/watch?v=Zg7GK759ZzA

Nice approach for using multiple threads for scanning the disk tree is the Windirstat - https://windirstat.net/, that runs as multithread, with limit of threads.

aleksaan commented 4 years ago

@jurajazz , hi

can you just test a new version before I publish it?
diskusage.zip

I made files metrics will not save in memory if real depth of file upper than parameter "depth". And, there was added system memory allocated metric to results

I didnt especially time optimization, I think execution time most related to disk speed than a parrallels computations

aleksaan commented 4 years ago

@jurajazz please see above

jurajazz commented 4 years ago

Hi Alexander,

I started to use linux command du compiled for windows (as part of git package), which does the similar with minimum memory consumptions.

I can test also your new version with optimized memory, however instead of compiled executable, I would prefer to compile it myself for security reasons. Could you please commit the changes of source codes - e.g. into special branch or just zip the source codes. It would also save me some time if you could describe the way you suggest to compile it on Windows.

Juraj

aleksaan commented 4 years ago

o, ok, sources is updated and read readme - I've added a new metric - Total used memory Please compare this metric within actual memory consumption

In my case It was 203Mb without optimization & 28Mb with this one (for 101.62 Gb disk, depth=2, limit=20) And there is some side effect of reducing total time from 1.45min to 1.15min