aurzenligl / prophy

prophy: fast serialization protocol
MIT License
13 stars 8 forks source link

Prophy is creating temporary file in fixed location #3

Closed nokiaPWr closed 8 years ago

nokiaPWr commented 8 years ago

Prophy is creating temporary file in fixed location (/tmp/.prophy/parsetab_calc.py), which makes it useless for hosts that are used by different users.

kamichal commented 8 years ago

It's a temporary file which is used only for time when parsing operation is performed. It's not intended for further use. The /tmp/.prophy/ path is picked by tempfile.py (python build-in) module as a temporary directory location. If /tmp/ location is not accesible, it will pick different one depending on user's operating system and configuration.

kuchara commented 8 years ago

I suppose tempfile.mkdtemp (https://docs.python.org/2/library/tempfile.html#tempfile.mkdtemp) should be used to safely create a temporary directory.

I've made a pull request: #4.

aurzenligl commented 8 years ago

Actually, parser table can be generated in-memory or to file ("caching"). Only reason to write to file is to generate once and reuse afterwards: http://www.dabeaz.com/ply/ply.html#ply_nn36

Using mkdtemp makes it impossible to reuse parser tables, so writing to file doesn't make sense either in such case.

Problem can be solved by:

Before solving problem - however - I'd like to understand it fully. Can you actually explain the problem in detail? How can I reproduce it?

kuchara commented 8 years ago

Ok, if that is the case (reusing parser tables), my change does not make any sense.

Problem is that when running multiple parallel processes of prophyc, a race condition exists, and such error happens:

Traceback (most recent call last):
  File "/usr/bin/prophyc", line 9, in <module>
    load_entry_point('prophy==0.7.4', 'console_scripts', 'prophyc')()
  File "/usr/lib/python2.7/site-packages/setuptools-18.2-py2.7.egg/pkg_resources/__init__.py", line 558, in load_entry_point
  File "/usr/lib/python2.7/site-packages/setuptools-18.2-py2.7.egg/pkg_resources/__init__.py", line 2682, in load_entry_point
  File "/usr/lib/python2.7/site-packages/setuptools-18.2-py2.7.egg/pkg_resources/__init__.py", line 2355, in load
  File "/usr/lib/python2.7/site-packages/setuptools-18.2-py2.7.egg/pkg_resources/__init__.py", line 2361, in resolve
  File "/usr/lib/python2.7/site-packages/prophyc/__init__.py", line 7, in <module>
    from . import model
  File "/usr/lib/python2.7/site-packages/prophyc/model.py", line 5, in <module>
    from . import calc
  File "/usr/lib/python2.7/site-packages/prophyc/calc.py", line 9, in <module>
    os.makedirs(PROPHY_DIR)
  File "/usr/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 17] File exists: '/tmp/.prophy'

Looking at the source code, it seems that between if not os.path.exists(PROPHY_DIR) and os.makedirs(PROPHY_DIR), other process creates the directory, causing makedirs to raise exception.

If you decide to keep cache, then such directory (/tmp/.prophy/) creation need to be rewritten into something multiprocessing-friendly.

aurzenligl commented 8 years ago

I was able to reproduce using parallel:

parallel -N0 python -m prophyc --python_out . prophy_cpp/test/Dynfields.prophy ::: {1..300}

Issue is solved by write_tables = 0 and released in 0.7.6.

Some explanation:

In ply 3.8 reusing parser tables from files requires sys.path.append(PROPHY_DIR), which means current code needlessly writes files which it cannot possibly reuse anyway.

Since not writing/reusing them doesn't degrade performance (they're not reused now!) and solves concurrent fs access problems, and fixing tables reusability gives only a small performance boost, let's go this way.

Most robust solution seems to be to generate parser tables during installation and use them from then on: https://github.com/dabeaz/ply/issues/38 Let's go that way when performance hits us.

Thank you for help :)