Fuco1 / dired-hacks

Collection of useful dired additions
GNU General Public License v3.0
873 stars 77 forks source link

refactor(dired-collapse): avoid using function 'length' on lists #212

Closed Boruch-Baum closed 4 months ago

Boruch-Baum commented 7 months ago

This PR will improve efficiency on directories with many files, in two locations, from O(n) to O(1). This is because Lisp needs to traverse "true" lists in order to determine their lengths. emacs-lisp "cheats" in that many of its data structures are really vectors/arrays instead of "true" lisp lists, but not in the cases subject to this PR, IIUC.

Fuco1 commented 7 months ago

I would be very interested in some numbers here. Often in Elisp, accessing two variables is slower than running length. In practice, these things are rarely slow and the code which is more complicated is not worth it.

To call length on a list of 1 million items seems to take about 10 milliseconds on my computer, and you would never have a hierarchy 1 milion levels deep, or even have 1 million files in a directory etc.

Boruch-Baum commented 7 months ago

The PR deals with a specific ELisp 'list' object, for which function length needs to count each and every CDR of the list each time the function is called. Contrary to theoretical lisp, Emacs-Lisp doesn't represent other objects (vectors, arrays, strings) internally as lists because of the awful practical performance consequences. So, when you say 'often' and 'rarely', you need to keep in mind exactly what data-type objects you're making observations about.

On 2024-03-21 05:45, Matus Goljer wrote:

I would be very interested in some numbers here. Often in Elisp, accessing two variables is slower than running length. In practice, these things are rarely slow and the code which is more complicated is not worth it.

To call length on a list of 1 million items seems to take about 10 milliseconds on my computer, and you would never have a hierarchy 1 milion levels deep, or even have 1 million files in a directory etc.

— Reply to this email directly, [1]view it on GitHub, or [2]unsubscribe. You are receiving this because you authored the thread. Message ID: @.***>

References

  1. https://github.com/Fuco1/dired-hacks/pull/212#issuecomment-2012192747
  2. https://github.com/notifications/unsubscribe-auth/AAOE3KF4VPM7GT763NJ57JLYZLI5TAVCNFSM6AAAAABEQQPG6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGE4TENZUG4

-- hkp://keys.gnupg.net CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0

Fuco1 commented 7 months ago

Yes, it needs to count each cdr, but unless you have literal billions of them it doesn't matter. Counting 20 or 30 conses takes microseconds. So here the fact that reading (length x) is immediately obvious is far more valuable.

I think this is a performance optimization worth having.

Fuco1 commented 4 months ago

I decided not to merge this: