Open Robbepop opened 6 years ago
@XAMPPRocky How difficult would this be to implement? Without having looked at tokei (aside from adding PostCSS), my initial thought would be to use a HashSet
to store inodes as files are visited. I'd have to check on OS-interop for that, though.
I'd be interested in doing this if it's not too hard to jump into.
@jhpratt Hey, I don't about about the second option, however the first option should be easy to add. Tokei's CLI is set using clap
in src/cli.rs
and its configuration is set in src/config.rs
. I would create this option very similar to --no-ignore
/no_ignore
that is already in the code.
For traversing the file system tokei uses ignore
which has an option in WalkBuilder::follow_links
that says whether it should follow symbolic links or not.
Looks like there's the same-file
crate which could be of use. A quick test shows it handles both hard and symbolic links correctly.
Looking through tokei's code, I presume src/utils/fs.rs
is where a change would need to be made, given the presence of the get_all_files
method. I'm not entirely sure what is happening there, though.
@jhpratt Yes sorry I should have pointed you to that. I believe that is what ignore
uses for follow_links
. I would add the follow_links
as part of the WalkerBuilder
configuration.
Everything before walker.build_parallel().run(…)
is configuring the walker's behaviour (what paths to search, to ignore, etc). walker.build_parallel().run(…)
runs in parallel over the paths and sends any file paths to rx
(a crossbeam::Receiver
), which are then validated as programming languages.
If you have any other questions please feel free to ask.
Looking through various documentation, looks like symlinks are currently ignored, and a trivial check shows this is the case.
What would be the preferred way to handle hard links? If it were up to me, I'd lean towards automatically excluding anything past the first, as you'd essentially be counting the file twice.
Thanks for the explanation, by the way! Responsiveness is quite helpful :slightly_smiling_face:
Edit: Turns out it's nearly trivial to exclude a file the second time around. Using DashMap
instead of HashMap
because it's parallel, it amounts to just wrapping an if
statement around the tx.send()
.
Today I found out that tokei's reported lines of code for my repository seemed to explode. The reason for this was that tokei (incorrectly) counted links (symbolic-links and hard-links) as if they were regular files.
I think it would be the best to make tokei completely ignore symbolic links and ignore all multiple occurrences of the same hard linked file.
While restating defaults might be hard, it should be also okay to add options to filter links out of the accumulation process. I think options like
--ignore-symbolic-links
: Ignores any symbolic link.--ignore-multiple-hard-links
: Ignores all but one occurences of a hard linked file that has its source within the given search space (repository for example) and ignore all hard linked files that have their source outside of the search space.