Open samhh opened 3 months ago
This is a known limitation. I am unsure if this is properly documented.
From what I see in the code, the contents from .gitignore
files are retrieved via the function retrieve_git_ignore_matches
.
I checked the places where this function is called, and (for the relevant cases where this issue applies), we always have an available paths
instance that could be used to our advantage.
I think that we could apply the following strategy:
paths
is already populated by the time we call retrieve_git_ignore_matches
.retrieve_git_ignore_matches
that basically does the same, but it also accepts a paths
parameter (let's call it retrieve_merged_git_ignore_matches
for now). We do that because we want to keep the old function for the places where we call it without having access to a paths
variable with file paths in it.retrieve_merged_ignore_matches
, transform the param paths
into a shorter array, with only deduplicated directories (this is "just" a map + flatmap + reduce), for this description let's call this variable nested_dirs
.nested_dirs
and check if we can find .gitignore
files in them (without using the auto_search
feature). In case we find them, we load them, we obtain their lines... but we pass them through a map function to adapt them (for example, prepending the relevant dir paths to the stated rules).EDIT:
Example. If we had a .gitignore
file at the root, with the contents:
node_modules/
and then another one at ./nested/path/.gitignore
, with the contents:
./cache/
*.log
The "merged" result would be as if the root .ignorefile
had:
node_modules/
./nested/path/cache/
./nested/path/**/*.log # in this case we add an ** infix
EDIT 2:
~An extra doubt that came to me once I was looking at the code is if we adapt these ignore rules in any way once they are loaded.~
~I see that we use the auto_search
function to look for .gitignore
files up to the root of the filesystem if they are not immediately available, but sometimes these rules are relative to the directory where they are placed, and I didn't see any transformation logic for them in the retrieve_git_ignore_matches
function (I doubt it is applied later, becase we loose the contextual information telling us at which directory level those rules were placed).~
Nevermind, I see that the function returns a tuple, not just a rules vector.
Well, it seems I was too optimistic. paths
is not always "expanded" at that point, so those values wouldn't be enough.
It is also necessary to replicate part of the logic inside of the biome_fs
crate, in the os
module (functions such as handle_dir
and handle_any_file
), but with 2 differences:
spawn
), because we need to collect the results of that traversal as soon as possible (unless there is an elegant way to do that asynchronously, I'm not an expert in Rust. Using a mutex over a shared data structure?).@Conaclos Do you think the approach I outlined makes sense? Or would it hurt performance too much because of the synchronous file traversal? (I'm not used to deal with highly optimised code, so I'm not entirely sure).
I think @ematipico knows more about this.
Because of the current approach, it isn't possible to perfectly coordinate and collect the information when we traverse the file system.
We have been talking about refactoring the traversal to make it a bit different and make it more "synchronous" as you suggested. The first step might be to actually not spawning a process when traversing a folder.
Instead, we should first pause and check for relevant files such as .gitignore
. After we collected the info we need from these key files, I think it's safe starting to spawn processed for the rest of the files (and ignore the files we already read, e.g. the .gitignore
files)
From what I see in the code, the contents from .gitignore files are retrieved via the function retrieve_git_ignore_matches.
I checked the places where this function is called, and (for the relevant cases where this issue applies), we always have an available paths instance that could be used to our advantage.
Are you talking about retrieve_gitignore_matches
? What paths
are you referring too? Maybe a link to the code would help understanding the context.
Hi @ematipico , I'm sorry, I missed your response.
I have a working branch with a lot of ugly code that illustrates what I was trying to do. I can create a draft PR, so it will be easier to discuss it.
My approach has been to only look for the "relevant" nested .gitignore
files, that is, to not explore the whole directories tree, but only the specific directories that could affect the traversal, and create a sort of global merged .gitignore
file, to avoid having to switch configurations as we do the traversal.
Environment information
What happened?
vcs.useIgnoreFile
respects:/.gitignore
but not nested ignore files such as:/packages/foo/.gitignore
.Expected result
vcs.useIgnoreFile
should respect nested ignore files like Git does. They're helpful in monorepos.Code of Conduct