LibraryOfCongress / bagit-python

Work with BagIt packages from Python.
http://libraryofcongress.github.io/bagit-python
218 stars 85 forks source link

Parsing exceptions occur with invocation of multiprocessing #130

Open dDArchivist opened 5 years ago

dDArchivist commented 5 years ago

When I attempt to run bagit.py with --processes option, I receive an enormous output of parsing exceptions. This does not occur when I do NOT invoke --processes. Is there is something wrong with my Python installation or is there something wrong with this component of the program?

I am running the Win 64 bit version of python 3.7.2 w/ a machine that uses 4 cores, 2 threads per core. I've attached a sample of the parsing exceptions output.

Thanks!

python_bagit_multiprocessing_error.txt

edsu commented 5 years ago

Interesting, it looks like you are on Windows? If you clone the repo and run the tests do they pass?

git clone https://github.com/libraryofcongress/bagit-python.git 
cd bagit-python
python setup.py test

I wonder if this difference in behavior of multiprocessing on Windows might help explain what's going on?

dDArchivist commented 5 years ago

Hi. Thanks for taking a look. I followed your directions by using the bash emulator in Git for Windows. I'm attaching a sample.

bagit_py_test.txt

edsu commented 5 years ago

It looks like the test suite is noticing problems too. We should really be running thr tests on windows regularly as part of our builds.

acdha commented 5 years ago

Yeah, back in the day there wasn't a great Windows CI option but there are several now. In addition to Travis, I know bagit-java uses https://www.appveyor.com/ and Azure Pipelines announced a free tier for open-source: https://azure.microsoft.com/en-us/blog/announcing-azure-pipelines-with-unlimited-ci-cd-minutes-for-open-source/

ross-spencer commented 2 years ago

Same issue here. I will also suggest it looks like python setup.py test vs. tox -e py38 vs. the primary issue in this ticket look like different issues. Though I can recreate those on Windows 11. I need more time to have a look but I'll attach some of the multiprocessing error log below.

NB. Truncated as it just loops infinitely / or for a very long time I'm not keen to find out about.

2022-05-12 13:36:29,225 - INFO - Creating bag for directory C:\temp\testing\bags\govdoc-gifs
2022-05-12 13:36:29,229 - INFO - Creating data directory
2022-05-12 13:36:29,235 - INFO - Moving bag-info.txt to C:\temp\testing\bags\govdoc-gifs\tmp7t7u69he\bag-info.txt
2022-05-12 13:36:29,235 - INFO - Moving bagit.txt to C:\temp\testing\bags\govdoc-gifs\tmp7t7u69he\bagit.txt
2022-05-12 13:36:29,236 - INFO - Moving data to C:\temp\testing\bags\govdoc-gifs\tmp7t7u69he\data
2022-05-12 13:36:29,237 - INFO - Moving manifest-sha256.txt to C:\temp\testing\bags\govdoc-gifs\tmp7t7u69he\manifest-sha256.txt
2022-05-12 13:36:29,238 - INFO - Moving manifest-sha512.txt to C:\temp\testing\bags\govdoc-gifs\tmp7t7u69he\manifest-sha512.txt
2022-05-12 13:36:29,239 - INFO - Moving tagmanifest-sha256.txt to C:\temp\testing\bags\govdoc-gifs\tmp7t7u69he\tagmanifest-sha256.txt
2022-05-12 13:36:29,239 - INFO - Moving tagmanifest-sha512.txt to C:\temp\testing\bags\govdoc-gifs\tmp7t7u69he\tagmanifest-sha512.txt
2022-05-12 13:36:29,240 - INFO - Moving C:\temp\testing\bags\govdoc-gifs\tmp7t7u69he to data
2022-05-12 13:36:29,241 - INFO - Using 2 processes to generate manifests: sha256, sha512
Traceback (most recent call last):
Traceback (most recent call last):
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\packaging\requirements.py", line 98, in __init__
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\packaging\requirements.py", line 98, in __init__
    req = REQUIREMENT.parseString(requirement_string)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1654, in parseString
    req = REQUIREMENT.parseString(requirement_string)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1654, in parseString
    raise exc
    raise exc
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1644, in parseString
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1644, in parseString
    loc, tokens = self._parse( instring, 0 )
    loc, tokens = self._parse( instring, 0 )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3417, in parseImpl
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3417, in parseImpl
    loc, exprtokens = e._parse( instring, loc, doActions )
    loc, exprtokens = e._parse( instring, loc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3739, in parseImpl
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3739, in parseImpl
    return self.expr._parse( instring, loc, doActions, callPreParse=False )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
    return self.expr._parse( instring, loc, doActions, callPreParse=False )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3400, in parseImpl
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3400, in parseImpl
    loc, resultlist = self.exprs[0]._parse( instring, loc, doActions, callPreParse=False )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1406, in _parseNoCache
    loc, resultlist = self.exprs[0]._parse( instring, loc, doActions, callPreParse=False )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1406, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 2711, in parseImpl
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 2711, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pkg_resources._vendor.pyparsing.ParseException: Expected W:(abcd...) (at char 0), (line:1, col:1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
    raise ParseException(instring, loc, self.errmsg, self)
pkg_resources._vendor.pyparsing.ParseException: Expected W:(abcd...) (at char 0), (line:1, col:1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Spencer\Apps\python\lib\multiprocessing\spawn.py", line 116, in spawn_main
  File "C:\Users\Spencer\Apps\python\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\Spencer\Apps\python\lib\multiprocessing\spawn.py", line 125, in _main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\Spencer\Apps\python\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\Spencer\Apps\python\lib\multiprocessing\spawn.py", line 234, in prepare
    prepare(preparation_data)
  File "C:\Users\Spencer\Apps\python\lib\multiprocessing\spawn.py", line 234, in prepare
    _fixup_main_from_name(data['init_main_from_name'])
  File "C:\Users\Spencer\Apps\python\lib\multiprocessing\spawn.py", line 258, in _fixup_main_from_name
    _fixup_main_from_name(data['init_main_from_name'])
  File "C:\Users\Spencer\Apps\python\lib\multiprocessing\spawn.py", line 258, in _fixup_main_from_name
    main_content = runpy.run_module(mod_name,
  File "C:\Users\Spencer\Apps\python\lib\runpy.py", line 210, in run_module
    main_content = runpy.run_module(mod_name,
  File "C:\Users\Spencer\Apps\python\lib\runpy.py", line 210, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "C:\Users\Spencer\Apps\python\lib\runpy.py", line 97, in _run_module_code
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "C:\Users\Spencer\Apps\python\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\Spencer\Apps\python\lib\runpy.py", line 87, in _run_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\Spencer\Apps\python\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\bagit.py", line 52, in <module>
    exec(code, run_globals)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\bagit.py", line 52, in <module>
    VERSION = get_distribution(MODULE_NAME).version
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\__init__.py", line 464, in get_distribution
    VERSION = get_distribution(MODULE_NAME).version
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\__init__.py", line 464, in get_distribution
    dist = Requirement.parse(dist)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\__init__.py", line 3139, in parse
    dist = Requirement.parse(dist)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\__init__.py", line 3139, in parse
    req, = parse_requirements(s)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\__init__.py", line 3084, in parse_requirements
    req, = parse_requirements(s)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\__init__.py", line 3084, in parse_requirements
    yield Requirement(line)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\__init__.py", line 3094, in __init__
    yield Requirement(line)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\__init__.py", line 3094, in __init__
    super(Requirement, self).__init__(requirement_string)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\packaging\requirements.py", line 100, in __init__
    super(Requirement, self).__init__(requirement_string)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\packaging\requirements.py", line 100, in __init__
    raise InvalidRequirement(
pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse error at "'__mp_mai'": Expected W:(abcd...)
    raise InvalidRequirement(
pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse error at "'__mp_mai'": Expected W:(abcd...)
Traceback (most recent call last):
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\packaging\requirements.py", line 98, in __init__
    req = REQUIREMENT.parseString(requirement_string)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1654, in parseString
    raise exc
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1644, in parseString
Traceback (most recent call last):
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\packaging\requirements.py", line 98, in __init__
    loc, tokens = self._parse( instring, 0 )
    req = REQUIREMENT.parseString(requirement_string)
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1654, in parseString
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3417, in parseImpl
    raise exc
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1644, in parseString
    loc, tokens = self._parse( instring, 0 )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
    loc, exprtokens = e._parse( instring, loc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3417, in parseImpl
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3739, in parseImpl
    loc, exprtokens = e._parse( instring, loc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
    return self.expr._parse( instring, loc, doActions, callPreParse=False )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3739, in parseImpl
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3400, in parseImpl
    return self.expr._parse( instring, loc, doActions, callPreParse=False )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1402, in _parseNoCache
    loc, resultlist = self.exprs[0]._parse( instring, loc, doActions, callPreParse=False )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1406, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 3400, in parseImpl
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 2711, in parseImpl
    loc, resultlist = self.exprs[0]._parse( instring, loc, doActions, callPreParse=False )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 1406, in _parseNoCache
    raise ParseException(instring, loc, self.errmsg, self)
pkg_resources._vendor.pyparsing.ParseException: Expected W:(abcd...) (at char 0), (line:1, col:1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\Spencer\Apps\python\lib\site-packages\pkg_resources\_vendor\pyparsing.py", line 2711, in parseImpl
  File "C:\Users\Spencer\Apps\python\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\Spencer\Apps\python\lib\multiprocessing\spawn.py", line 125, in _main
    raise ParseException(instring, loc, self.errmsg, self)
pkg_resources._vendor.pyparsing.ParseException: Expected W:(abcd...) (at char 0), (line:1, col:1)

Details -

(venv) C:\temp\testing\bags>python
Python 3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

(venv) C:\temp\testing\bags>python -m pip freeze
bagit==1.8.1

Invocation (basically anything above 1 for the processes):

python -m bagit --processes 2 <folder_name>
tw4l commented 1 year ago

I'm seeing the same behavior described by @ross-spencer with --processes with bagit-python version 1.8.1, Python 3.10.9, on a Macbook with M2 chip.

versmar commented 7 months ago

I ran into this issue as well, running bagit-python version 1.8.1, Python 3.11.8 on Windows 10 and Windows Server 2019.

I think it does relate to differences in multiprocessing suggested by @edsu.

My limited understanding of the problem - In the spawned pool processes, line 47 MODULE_NAME = "bagit" if __name__ == "__main__" else __name__ sets MODULE_NAME to __mp_main__ When this is passed to get_distribution(MODULE_NAME) on line 52 it causes an exception, I guess because of the underscores in __mp_main__. Adding except InvalidRequirement: to that try/except block gets rid of those errors, but the logging generated by spawned processes still doesn't work as expected. They're getting their own logger on line 49, and it doesn't have a logging.basicConfig() statement to set things up. I think there might be a solution in logging.QueueHandler and logging.QueueListener, but haven't looked into it further.