gitpython-developers / GitPython

GitPython is a python library used to interact with Git repositories.
http://gitpython.readthedocs.org
BSD 3-Clause "New" or "Revised" License
4.61k stars 905 forks source link

Make use of xargs for Repo().ignored function #1790

Open ericwb opened 9 months ago

ericwb commented 9 months ago

In my use case, I wish to utilize the Repo class function ignored() to filter out a potentially larger list of files. The issue is that somethings this list of files is way too large. For example:

The command line max argument size is defined as noted here:

$ getconf ARG_MAX
1048576

However, my code is going beyond that maximum because the git repos I have chosen have lots of ignorable files.

I get this error:

Traceback (most recent call last):
  File "/Users/ericwb/workspace/bandit/.tox/py312/bin/bandit", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/ericwb/workspace/bandit/bandit/cli/main.py", line 657, in main
    b_mgr.discover_files(args.targets, args.recursive, args.excluded_paths)
  File "/Users/ericwb/workspace/bandit/bandit/core/manager.py", line 252, in discover_files
    ignore_list = repo.ignored(*files)
                  ^^^^^^^^^^^^^^^^^^^^
  File "/Users/ericwb/workspace/bandit/.tox/py312/lib/python3.12/site-packages/git/repo/base.py", line 878, in ignored
    proc: str = self.git.check_ignore(*paths)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ericwb/workspace/bandit/.tox/py312/lib/python3.12/site-packages/git/cmd.py", line 736, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ericwb/workspace/bandit/.tox/py312/lib/python3.12/site-packages/git/cmd.py", line 1316, in _call_process
    return self.execute(call, **exec_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ericwb/workspace/bandit/.tox/py312/lib/python3.12/site-packages/git/cmd.py", line 988, in execute
    proc = Popen(
           ^^^^^^
  File "/Users/ericwb/.pyenv/versions/3.12.1/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/Users/ericwb/.pyenv/versions/3.12.1/lib/python3.12/subprocess.py", line 1950, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: 'git'

If I was using the command line, I'd take advantage of xargs to split the argument size below the max and repeatedly call git check-ignore. For reference: https://stackoverflow.com/questions/2381241/what-is-the-subprocess-popen-max-length-of-the-args-parameter

While I can do the splitting of the list in my Python code, I believe it makes more sense that the library designed to call command lines does it itself. And maybe it doesn't make sense for all commands, be do believe xargs would work well for git check-ignore

Byron commented 9 months ago

Thanks for reporting!

I think this could be solved by providing the paths to stdin. It's probably not as trivial as it sounds as the input would have to be provided while the output is consumed to avoid deadlock due to filled pipes.

Reference ``` ❯ git check-ignore -h usage: git check-ignore [] ... or: git check-ignore [] --stdin -q, --quiet suppress progress reporting -v, --verbose be verbose --stdin read file names from stdin -z terminate input and output records by a NUL character -n, --non-matching show non-matching input paths --no-index ignore index when checking ```