XAMPPRocky / tokei

Count your code, quickly.
Other
11.37k stars 542 forks source link

Add list of files not recognized by tokei in the report #883

Open UtsavChokshiCNU opened 2 years ago

UtsavChokshiCNU commented 2 years ago

Context :

We are using tokei for identifying language & metadata for a file.
Some of the repositories under inspection are in GBs.

Problem :

As tokei does not support all file types, it misses good number of files in the output. In order to find out which files are not reported by tokei, one has to scan an entire repository once more. That is very expensive operation on large repositories.

Feature Request :

It would be great if the tokei report lists all files that are not recognized under any language.

Example :

ls . Dockerfile hello.noon

tokei . -o json

{
"Dockerfile": {
"blanks": 0,
"children": {},
"code": 0,
"comments": 0,
"inaccurate": false,
"reports": [
{
"name": "./Dockerfile",
"stats": {
"blanks": 0,
"blobs": {},
"code": 0,
"comments": 0
}
}
]
},
"Total": {
"blanks": 0,
"children": {
"Dockerfile": [
{
"name": "./Dockerfile",
"stats": {
"blanks": 0,
"blobs": {},
"code": 0,
"comments": 0
}
}
]
},
"code": 0,
"comments": 0,
"inaccurate": false,
"reports": []
}
}

Expected Sample Output :

{
  "Dockerfile": {
    "blanks": 0,
    "children": {},
    "code": 0,
    "comments": 0,
    "inaccurate": false,
    "reports": [
      {
        "name": "./Dockerfile",
        "stats": {
          "blanks": 0,
          "blobs": {},
          "code": 0,
          "comments": 0
        }
      }
    ],
    "Unrecognized": {
      "reports": [
        {
          "name": "./hello.noon"
        }
      ]
    }
  },
  "Total": {
    "blanks": 0,
    "children": {
      "Dockerfile": [
        {
          "name": "./Dockerfile",
          "stats": {
            "blanks": 0,
            "blobs": {},
            "code": 0,
            "comments": 0
          }
        }
      ],
      "Unrecognized": [
        {
          "name": "./hello.noon"
        }
      ]
    },
    "code": 0,
    "comments": 0,
    "inaccurate": false,
    "reports": []
  }
}

Please note that this issue is different from #209. Tokei is not expected to perform analysis on an unrecognized file. So tokei does not have to process such files. It just needs to report it.

XAMPPRocky commented 2 years ago

Thank you for your issue! This is a pretty interesting use-case. I think the one question I would have is, how would you expect tokei to handle the case of files that failed for system related reasons, e.g. file not found, permission denied, etc?

UtsavChokshiCNU commented 2 years ago

I don't understand how tokei works internally but I believe such files should go unreported as it is user's responsibility to make sure that correct access has been provided for the code under inspection. If user does not want to scan certain files then (s)he can always at it to .ignore kind of files.

adam-tokarski commented 2 years ago

I think the one question I would have is, how would you expect tokei to handle the case of files that failed for system related reasons, e.g. file not found, permission denied, etc?

I think these should be handled like anything else in tokei. I mean - file is file regardless if you know how to interpret the inner parts or not. How it handles such situations now?

Seems like also covering https://github.com/XAMPPRocky/tokei/issues/897.

adam-tokarski commented 2 years ago

Ok, I've read properly by #209, and see the issue here - so I guess it would be nice, at the end of the day, to have some nice report of unsupported files (with count of lines at all, because why not), but the issue is that tokei should avoid any obviously not interesting and potentially heavy binary or archive files.

Would it be good then to take some approach to recognize whether file is binary or not using something like content_inspector to check that for every (not recognized) file? If both are true - file is unsupported but still text type - it could just count its overall lines and put that to some dedicated language type?