Support relevant files with non-programming-language code (JSON, YAML)

MaibornWolff / metric-gardener

BSD 3-Clause "New" or "Revised" License

5 stars 0 forks source link

Support relevant files with non-programming-language code (JSON, YAML) #143

Closed ResistantBear closed 7 months ago

ResistantBear commented 8 months ago

Support JSON and YAML.

We may need to define different metrics for those, as most of our current metrics might not apply/are always zero.

ce-bo commented 8 months ago

First step would be to support both grammars and calculate LOC. We should then document, that other metrics are not calculated in the readme yet.

ce-bo commented 8 months ago

Is there a way to count the nesting level by checking text indentations? Maybe the tree nodes provide this information?

ce-bo commented 8 months ago

I would like to also support Markdown files please.

ce-bo commented 8 months ago

Part of #141

ResistantBear commented 8 months ago

Is there a way to count the nesting level by checking text indentations? Maybe the tree nodes provide this information?

As I have already researched for the discussion here, there are no such nodes for whitespaces. Apparently, this is because of the type of grammar that tree-sitter relies on.

Furthermore, calculating the nesting level of JSON-files and files of other non-whitespace sensitive languages would be problematic, as e.g. metric-gardener itself does not format its JSON-Output to be nicely idented, as that is not strictly necessary/ignored by any parser.

We could check if we can somehow determine the nesting level based upon the nesting of the nodes in the tree.

ResistantBear commented 8 months ago

If we want to analyse something based upon the number of whitespaces/identation, we would have to implement that without tree-sitter, working on the raw source code.

As far as I had understand on friday, CodeCharta already has a support for calculating things on pure text. We should ensure that we do not implement stuff redundandly.

ResistantBear commented 8 months ago

First step would be to support both grammars and calculate LOC. We should then document, that other metrics are not calculated in the readme yet.

We may check if there are any useful metrics we can actually calculate for these files based upon the tree-sitter tree. I can imagine that we can use it for the nesting levels of JSON-files, but other than that, mh. If we do only wish to calculate lines of code, there is no need for installing an extraa grammar and parsing a tree, that would be covered by #142.

BridgeAR commented 8 months ago

I would count the overall structure as nesting level, no matter how the JSON is formatted.

E.g.,

{ "a": [1,2,3] }

// identical to

{
  "a": [
    1,
    2,
    3
  ]
}

ResistantBear commented 8 months ago

If we want to analyse something based upon the number of whitespaces/identation, we would have to implement that without tree-sitter, working on the raw source code.

As far as I had understand on friday, CodeCharta already has a support for calculating things on pure text. We should ensure that we do not implement stuff redundandly.

Counting identation levels is already realized for CodeCharta by another parser that works on raw text basis. So there is probably no need to implement such a thing with metric-gardener. If we calculate nesting levels, we should focus on calculating nesting levels based on syntax trees here.

ResistantBear commented 8 months ago

Adding the max_nesting_level metric is blocked until #179 is discussed and implemented, as it would show wrong results for all non-markup-language files.

ce-bo commented 8 months ago

Isn't there a way to bring these changes to main before the final decision is taken? We could do necessary adjustments afterwards.

ResistantBear commented 8 months ago

If we merge it as-is, we would have this "max_nesting_level" for all metrics, but with a wrong value for all files which are no json. Not a fan of this. If we change that, we would basically implement one of the choices from #179 already. Would there be any benefit from merging this before the refactoring of the json-output?

ResistantBear commented 8 months ago

Removed Markdown from this ticket. As discussed, #142 is sufficient for Markdown.

ResistantBear commented 7 months ago

Realize as discussed in #179