Wilfred / difftastic

a structural diff that understands syntax 🟥🟩
https://difftastic.wilfred.me.uk/
MIT License
21.08k stars 344 forks source link

Git respository has grown to 1.24 GiB #343

Open Wilfred opened 2 years ago

Wilfred commented 2 years ago

This is too much. It makes CI slower and contributing slower.

The git subtrees are getting too big, we might have to rewrite history to use snapshots of vendored parsers.

filmor commented 2 years ago

Looking at the objects, the main culprit are the precompiled vendor/*/src/parser.c files. I doubt that this can be fixed without excluding these and either generating them at build-time with tree-sitter generate from the grammar or just not vendoring them at all. Quite a few grammars are available on crates.io already.

Xuanwo commented 2 years ago

I don't know if releasing sub-crates like difftastic-language-xxx is a good idea.

filmor commented 2 years ago

I don't know if releasing sub-crates like difftastic-language-xxx is a good idea.

That's not what I meant. There are quite a few tree-sitter-* crates that one could depend on instead of vendoring them.

Wilfred commented 2 years ago

The majority of parsers in difftastic are either not available on crates.io, or the versions on crates.io are old.

I agree that the vendor/*/src/parser.c files are the biggest, and the SQL parser is particularly big: https://github.com/m-novikov/tree-sitter-sql/issues/59

If difftastic just had a snapshot of each parser, it wouldn't have the history of these large files, substantially reducing the size.

Alternatively, maybe it would make sense to look at creating the parser.c files during the build too. This would enable usage of the new, faster ABI https://github.com/tree-sitter/tree-sitter/pull/1852 and it's already the case that the Swift parser doesn't have parser.c checked in.

Xuanwo commented 2 years ago

Alternatively, maybe it would make sense to look at creating the parser.c files during the build too.

I prefer this way. I'm interested in implementing this, any notes for me?

nogweii commented 2 years ago

I think dynamically loading the parsers is the way forward: https://github.com/Wilfred/difftastic/pull/356 & #123

cglong commented 1 year ago

Could Git submodules be used here? That way, you could link to a specific version of each dependency without embedding it directly into the repo.