alex / rply

An attempt to port David Beazley's PLY to RPython, and give it a cooler API.
BSD 3-Clause "New" or "Revised" License
381 stars 60 forks source link

ParserGenerator creates cache files non-atomically #48

Closed jwilk closed 7 years ago

jwilk commented 9 years ago

RPLY doesn't create cache files atomically. Therefore it's possible that one ParserGenerator reads the cache file when it's already created, but not yet fully written by another ParserGenerator. Here's a simple reproducer, which tries to create two grammars in parallel:

import concurrent.futures
import random

import rply

def build_grammar():
    pg = rply.ParserGenerator(['VALUE'], cache_id=cache_id)
    @pg.production('main : VALUE')
    def main(p):
        return p[0]
    pg.build()
    return pg.build()

while True:
    cache_id = 'simple-' + ''.join(str(random.randint(0, 9)) for x in range(1, 10))
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as tpe:
        fu1 = tpe.submit(build_grammar)
        fu2 = tpe.submit(build_grammar)
        print(fu1.result(), fu2.result())

Sooner or later it fails with:

Traceback (most recent call last):
  File "parallel-rply.py", line 19, in <module>
    print(fu1.result(), fu2.result())
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 395, in result
    return self.__get_result()
  File "/usr/lib/python3.4/concurrent/futures/_base.py", line 354, in __get_result
    raise self._exception
  File "/usr/lib/python3.4/concurrent/futures/thread.py", line 54, in run
    result = self.fn(*self.args, **self.kwargs)
  File "parallel-rply.py", line 11, in build_grammar
    pg.build()
  File "/usr/lib/python3/dist-packages/rply/parsergenerator.py", line 189, in build
    data = json.load(f)
  File "/usr/lib/python3.4/json/__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.4/json/__init__.py", line 318, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.4/json/decoder.py", line 343, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.4/json/decoder.py", line 361, in raw_decode
    raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)
alex commented 9 years ago

Computers are horrible. I guess the solution is to write it to tempfile.mkstemp() and then os.rename it?

jwilk commented 9 years ago

Computers were invented solely to make programmers' lives miserable. ;-) Yup, os.rename() should do the trick on UNIX; I'd use tempfile.NamedTemporaryFile(delete=False) instead of mkstemp(). But on Windows, os.rename() fails when the destination file exists. Hmm, I guess in RPLY's case, you could just ignore exceptions from os.rename().