JGuetschow / UNFCCC_non-AnnexI_data

Apache License 2.0
3 stars 0 forks source link

error when running read_new_unfccc_crf_for_year #75

Open JGuetschow opened 1 year ago

JGuetschow commented 1 year ago

read_new_unfccc_crf_for_year does not commit it's results because of the error

OSError: [Errno 7] Argument list too long: 'git' This is probably from the very long list of files that are affected by the commit and passed to datalad. To get rid of this message we could commit country by country or dig deeper and find another solution. As manually committing after the error in the script work fine it's not urgent to solve this issue.

mikapfl commented 1 year ago

Possible solutions off the top of my head:

JGuetschow commented 1 year ago

I do run datalad.api.run from python

mikapfl commented 1 year ago

ah, okay, the error is from within datalad when it tries to run git?

mikapfl commented 1 year ago

If you add a full traceback I can forward it upstream

JGuetschow commented 1 year ago

Viewing the full output I now think it's known issue and a workaround exists (there was a lot of datalad output chucked in between the error output, so I didn't see the first part of the error)

Here is the (shortened) output:

[INFO] == Command exit (modification check follows) ===== 
[ERROR] Caught exception suggesting too large stack size limits. Hint: use 'ulimit -s' command to see current limit and e.g. 'ulimit -s 8192' to reduce it to avoid this exception. See https://github.com/datalad/datalad/issues/6106 for more information. 
[WARNING] Received an exception OSError([Errno 7] Argument list too long: 'git'). Canceling not-yet running jobs and waiting for completion of running. You can force earlier forceful exit by Ctrl-C. 
[INFO] Canceled 0 out of 0 jobs. 0 left running. 

<some info on 'unlock', 'run', and 'add' command follows (all ok)>

Traceback (most recent call last):
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/parallel.py", line 368, in _iter_threads
    raise _FinalShutdown()
datalad.support.parallel._FinalShutdown

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "UNFCCC_GHG_data/UNFCCC_CRF_reader/read_new_UNFCCC_CRF_for_year_datalad.py", line 26, in <module>
    read_new_crf_for_year_datalad(
  File "<repo_path>/UNFCCC_GHG_data/UNFCCC_CRF_reader/UNFCCC_CRF_reader_prod.py", line 407, in read_new_crf_for_year_datalad
    datalad.api.run(
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/interface/base.py", line 773, in eval_func
    return return_func(*args, **kwargs)
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/interface/base.py", line 763, in return_func
    results = list(results)
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/interface/base.py", line 873, in _execute_command_
    for r in _process_results(
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/interface/utils.py", line 319, in _process_results
    for res in results:
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/core/local/run.py", line 297, in __call__
    for r in run_command(cmd, dataset=dataset,
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/core/local/run.py", line 1091, in run_command
    for r in Save.__call__(
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/interface/base.py", line 873, in _execute_command_
    for r in _process_results(
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/interface/utils.py", line 319, in _process_results
    for res in results:
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/core/local/save.py", line 391, in __call__
    yield from ProducerConsumerProgressLog(
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/parallel.py", line 535, in __iter__
    for res in super().__iter__():
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/parallel.py", line 265, in __iter__
    yield from self._iter_threads(self._jobs)
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/parallel.py", line 417, in _iter_threads
    self.shutdown(force=True, exception=self._producer_exception or interrupted_by_exception)
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/parallel.py", line 233, in shutdown
    raise exception
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/parallel.py", line 401, in _iter_threads
    done_useful |= self._pop_done_futures(lgr)
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/parallel.py", line 463, in _pop_done_futures
    raise exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/parallel.py", line 329, in consumer_worker
    for r in res:
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/core/local/save.py", line 310, in save_ds
    for res in pds_repo.save_(
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/gitrepo.py", line 3579, in save_
    self._save_post(message, chain(*status_state.values()), need_partial_commit, amend=amend,
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/annexrepo.py", line 3556, in _save_post
    super(AnnexRepo, self)._save_post(
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/gitrepo.py", line 3331, in _save_post
    GitRepo.commit(
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/support/gitrepo.py", line 1449, in commit
    _ = self._call_git(
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/dataset/gitrepo.py", line 398, in _call_git
    for file_no, line in self._generator_call_git(args,
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/dataset/gitrepo.py", line 355, in _generator_call_git
    for file_no, content in generator:
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/runner/gitrunner.py", line 299, in run_on_filelist_chunks_items_
    for chunk_generator in self._get_chunked_results(cmd=cmd,
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/runner/gitrunner.py", line 184, in _get_chunked_results
    yield self.run(
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/runner/runner.py", line 206, in run
    results_or_iterator = threaded_runner.run()
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/runner/nonasyncrunner.py", line 343, in run
    return self._locked_run()
  File "<repo_path>/venv/lib/python3.8/site-packages/datalad/runner/nonasyncrunner.py", line 403, in _locked_run
    self.process = Popen(self.cmd, **kwargs)         # nosec
  File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: 'git'
TaskFailed - taskid:read_new_unfccc_crf_for_year
Command failed: './venv/bin/python UNFCCC_GHG_data/UNFCCC_CRF_reader/read_new_UNFCCC_CRF_for_year_datalad.py --submission_year=2022  --re_read' returned 1