XAMPPRocky / tokei

Count your code, quickly.
Other
11.02k stars 532 forks source link

Add option to filter out links #227

Open Robbepop opened 6 years ago

Robbepop commented 6 years ago

Today I found out that tokei's reported lines of code for my repository seemed to explode. The reason for this was that tokei (incorrectly) counted links (symbolic-links and hard-links) as if they were regular files.

I think it would be the best to make tokei completely ignore symbolic links and ignore all multiple occurrences of the same hard linked file.

While restating defaults might be hard, it should be also okay to add options to filter links out of the accumulation process. I think options like

jhpratt commented 4 years ago

@XAMPPRocky How difficult would this be to implement? Without having looked at tokei (aside from adding PostCSS), my initial thought would be to use a HashSet to store inodes as files are visited. I'd have to check on OS-interop for that, though.

I'd be interested in doing this if it's not too hard to jump into.

XAMPPRocky commented 4 years ago

@jhpratt Hey, I don't about about the second option, however the first option should be easy to add. Tokei's CLI is set using clap in src/cli.rs and its configuration is set in src/config.rs. I would create this option very similar to --no-ignore/no_ignore that is already in the code.

For traversing the file system tokei uses ignore which has an option in WalkBuilder::follow_links that says whether it should follow symbolic links or not.

jhpratt commented 4 years ago

Looks like there's the same-file crate which could be of use. A quick test shows it handles both hard and symbolic links correctly.

Looking through tokei's code, I presume src/utils/fs.rs is where a change would need to be made, given the presence of the get_all_files method. I'm not entirely sure what is happening there, though.

XAMPPRocky commented 4 years ago

@jhpratt Yes sorry I should have pointed you to that. I believe that is what ignore uses for follow_links. I would add the follow_links as part of the WalkerBuilder configuration.

Everything before walker.build_parallel().run(…) is configuring the walker's behaviour (what paths to search, to ignore, etc). walker.build_parallel().run(…) runs in parallel over the paths and sends any file paths to rx (a crossbeam::Receiver), which are then validated as programming languages.

If you have any other questions please feel free to ask.

jhpratt commented 4 years ago

Looking through various documentation, looks like symlinks are currently ignored, and a trivial check shows this is the case.

What would be the preferred way to handle hard links? If it were up to me, I'd lean towards automatically excluding anything past the first, as you'd essentially be counting the file twice.

Thanks for the explanation, by the way! Responsiveness is quite helpful :slightly_smiling_face:


Edit: Turns out it's nearly trivial to exclude a file the second time around. Using DashMap instead of HashMap because it's parallel, it amounts to just wrapping an if statement around the tx.send().