boyter / scc

Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go
MIT License
6.58k stars 261 forks source link

Recognize file type base on mime type #396

Open nkh opened 1 year ago

nkh commented 1 year ago

I have a bunch of bash files that are not counted in because they don't have a shebang

using the file type or the mime type would make them part of the scc

boyter commented 1 year ago

Could you provide an example of a file (2 would be better) that show how you would expect this to work. I want to see the file itself to determine how this should work.

nkh commented 1 year ago

Here's a file, let me know if you need more, full of bash code. https://github.com/nkh/ftl/blob/main/config/ftl/etc/core/ftl

As it is, it's not recognized.

If the extension is changed to .sh, it is recognized as a shell script

if "#!/bin/bash/ is at the beginning of the file, it is recognized as bash code

And now that I runt the test again, I realized that I was wrong.

file ftl -> ftl: Unicode text, UTF-8 text mimetype ftl -> ftl: text/plain

but it's right with a shebang: file ftl_shebang -> ftl_shebang: Bourne-Again shell script, Unicode text, UTF-8 text executable mimetype ftl_shebang -> ftl_shebang: application/x-shellscript

I must have mixed files earlier, sorry.

But let's not lose a good opportunity, I know what those files are, I can cheat and have a list of files and create temporary files, with extension or shebang to give to scc. Or scc could accept a list, and in the best of worlds also generate a list of the files it checked and what types it thought they were.

I'd understand if you feel that the input file with file types (and the list of files/types) is not something you want to implement., I can write a workaround.

boyter commented 1 year ago

Ah ok.

So the way scc works internally is to check the extension. If its a singular known file type it treats it as that. Where there are multiple it will inspect the first few thousand bytes counting keywords trying to identify the most likely type then count on that.

Where the filename itself matches, such as makefile the above applies.

Where nothing matches the file is then checked for the presence of a #! operator.

So what I get from the above is you want to do a remap? This currently exists perhaps.

Have a look at the following options, which might do what you are expecting.

--remap-all
--remap-unknown

I suspect either of those should work if you are prepared to add a small comment on the top of your files. Although I understand this might not be ideal.

I don't know if any other option is a good idea in this case, at least without it being an opt-in to ensure that performance is not tanked.

nkh commented 1 year ago

Thank for pointing at the remapping in this specific case I could add a vim tag to the bash files. I also ran a test with --remap-all that worked well (I ned to check the results a bit more).

I have symlinks in the directory structure, that completely broke scc I think as it never finished working.

boyter commented 1 year ago

The symlinks is one I want to know more about. I thought I took care of this. By default it should detect and ignore those, and you have to explicitly enable them using --include-symlinks.

Possible to get a case that replicates it? Id be curious to know if either of these projects are affected too since they have the new file walking logic I want to move scc to

https://github.com/boyter/cs https://github.com/boyter/dcd

nkh commented 1 year ago

The symlinks are link to directories lower under the directory structure.

IE: /p1/p2/A -> /p1/p2/p3/p4/A

How do you want me to run the above projects?

On Tue, Jul 11, 2023 at 12:40 AM Ben Boyter @.***> wrote:

The symlinks is one I want to know more about. I thought I took care of this. By default it should detect and ignore those, and you have to explicitly enable them using --include-symlinks.

Possible to get a case that replicates it? Id be curious to know if either of these projects are affected too since they have the new file walking logic I want to move scc to

https://github.com/boyter/cs https://github.com/boyter/dcd

— Reply to this email directly, view it on GitHub https://github.com/boyter/scc/issues/396#issuecomment-1629825305, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALCYMR3PQKBP7L324Q2ELXPSAEJANCNFSM6AAAAAA2DQUJII . You are receiving this because you authored the thread.Message ID: @.***>

boyter commented 1 year ago

Ideally id like a test case to replicate the issue. But I might be able to create one based on what you have mentioned above.

What OS are you on?

nkh commented 1 year ago

linux

here's how my directory structure looks lile

config/
└── ftl
    ├── bindings
    ├── commands -> etc/commands/ * link
    ├── etags -> etc/etags/ *link
    ├── etc
    │   ├── bin
    │   │   └── third_party
    │   ├── bindings
    │   │   └── lib
    │   ├── commands
    │   │   └── ftlrc_dir
    │   ├── core
    │   │   └── lib
    │   │       ├── lock_preview
    │   │       └── merge
    │   ├── etags
    │   ├── filters
    │   ├── generators
    │   └── viewers
    ├── filters -> etc/filters/ *link
    ├── generators -> etc/generators/ *link
    ├── man
    └── viewers -> etc/viewers/ *link