Closed jeremywiebe closed 9 years ago
Yeah, this is a pretty tough problem to solve since once you start splitting on whitespace, you'll end up matching on everything, especially when looking at code diffs where files and code are mixed together.
however I do think we could do something here behind an experimental flag that would basically be a super-matchy regex that actually checks on the filesystem level if the match is a file (and then only includes that).
so let me retitle this issue, since i think we'll move in that direction regardless
It has issues as do we all and I need some help any sites known that I can go to learn
I moved my comment from #94 to this one to keep it all in one place.
I ran git status
in my ~/git/dotfiles
repository and i'm using thoughtbot's rcm tool for management which is why the files are not prefixed with a .
:
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: irbrc
modified: pryrc
modified: rspec
no changes added to commit (use "git add" and/or "git commit -a")
Only and/or
in the last line is recognized - incorrectly.
Single files not prepended by paths that don't have extensions (like irbrc
) are super similar to just words. Sure we could do filesystem validation to check these cases, but it's important to note that if we want to hit matches like that, we have to either:
The latter is the better bet here. For the git status input you just mentioned, doing a basic back-of-the-envelope calc yields that we would have to do 50 filesystem checks for word-like files:
>>> input = '''
... On branch master
... Your branch is up-to-date with 'origin/master'.
... Changes not staged for commit:
... (use "git add <file>..." to update what will be committed)
... (use "git checkout -- <file>..." to discard changes in working directory)
...
... modified: irbrc
... modified: pryrc
... modified: rspec
...
... no changes added to commit (use "git add" and/or "git commit -a")
... '''
>>> len([word for word in input.split(' ') if len(word) > 0])
50
And that's for a 9 line input. now say if you pipe an entire git diff command into fp -- say its ~1k lines, that means we might do ~5k filesystem checks.
We'll move in this direction eventually but I do want to point out how overmatching can be dangerous
On the flip side, @pcottle , we can do it the reverse way: consider we only need to do "crazy stuff" if the word is basically a file in the current dir, because otherwise it must have some sort of path (whether relative or absolute).
So, instead of matching words and then doing a filesystem check, let's just grab a list of all files in the current dir FIRST, and then match against the words.
This is basically O(number of files in your local dir only), versus O(size of input).
Yeah true, although we should have some kind of sanity check to make sure O(files in your local dir) isn't too hard.
Also git outputs a lot of commands relative to the top level working tree, so we would have to check that too.
But I do like the idea of flipping it around!
ls
could be really slow on a mapped share or if there are tens of thousands of files in the local dir only. IMHO another interesting approach, which would be hard in the current arch: run it as a subprocess during picking, aka let's implement incremental searching. hard work. maybe another major release. prob put the ls
under the same experimental flag as for the other file stuff ?
I have no idea about performance patterns of git ls-files
but maybe detection of .git
and some clever combination of git ls-files
's options could provide a middle way between "special case for git output" and "ls
the whole current directory"?
Ok guys almost everything is addressed here! Check out #114 for the implementation. I'm going to make a separate issue for the spaces and extension-less files, since that will require another regex
See also attached screenshot.