fonttools / pyclipper

Cython wrapper for the C++ translation of the Angus Johnson's Clipper library (ver. 6.4.2)
MIT License
233 stars 42 forks source link

Releasing GIL While Processing #29

Closed revarbat closed 4 years ago

revarbat commented 4 years ago

If you could release the Global Interpreter Lock while doing more CPU intensive calls to ClipperLib (union, difference, intersection, minkowski, offset, etc), it would allow Python apps to parallelize much better when using threading. I wrote a 3D printing slicer program in Python called Mandoline-Py ( https://github.com/revarbat/mandoline-py ), and ran headlong into the performance limitations of the GIL.

As best as I can tell, this should be just a matter of surrounding the CPU intensive code with Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS. (See https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock )

flabowski commented 4 years ago

Hi, Did this work for you? And does it need to be in this library? Anyway, I thought Python functions are always locked... I parallelized some clipping calls in Cython some years ago by releasing the gil when calling c-functions. I added nogil in pyclipper/pyclipper.pyx to every cdef (E.g. cdef cppclass Xyz: turned into cdef cppclass Xyz nogil:). Then you can parallelize your loops in cython.

anthrotype commented 4 years ago

PRs are welcome :)

revarbat commented 4 years ago

I've never written Python extensions, so I don't know the particulars of the implementation.

I do know, however, that I cannot perform more than one concurrent long PyClipper call at a time. From my research, this appears to be because of the Global Interpreter Lock. PyClipper, once its arguments are converted to its internal format, shouldn't require access to any interpreter resources while it's doing its long processing, so it doesn't need to hold onto the GIL, at least until the results need returning. If the GIL is released, this should allow multiple threads to use PyClipper at once, concurrently. Yes, that means PyClipper would have to be minorly modified to release the GIL, but this looks fairly simple, as described above, and as explained in the previously given link.

revarbat commented 4 years ago

I'm only now discovering Cython. (obviously) This link may help: https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#acquiring-and-releasing-the-gil

revarbat commented 4 years ago

This is my first time working with Cython, but I think I've got it. I've created PR #30

flabowski commented 4 years ago

Yes, you are right: once the arguments are converted to the internal format the GIL is not needed for the actual AddPath and Execute operations. Afterwards the results must be converted back to Python objects. This is done by _to_clipper_path() and _from_clipper_path(), both operations require the GIL, as it "must be held by the current thread before it can safely access Python objects."

I released the GIL for the actual call of the C code. I don’t know if that helps to potentially run anything in parallel in python… But if you like to try it, checkout my branch. The Cython docs say:

Note that acquiring the GIL is a blocking thread-synchronising operation, and therefore potentially costly. It might not be worth releasing the GIL for minor calculations.

I don’t know how this applies to all of the clipper functions, but my guess would be that it could be worth it for most of the operations. @anthrotype I made a PR in case you want to have it in the master branch.

If you want to be in control and make sure several of your clipping operations happen in parallel, I highly recommend to do this in cython!

edit: okay you were faster, dont mind my PR :D

anthrotype commented 4 years ago

thanks to both! @flabowski could you please review @revarbat's PR #30?

flabowski commented 4 years ago

Nice work. Looks like we have done almost exactly the same :D I'd recommend declaring the whole cpp class as nogil. Also i recommend doing the AddPath stuff without the GIL as well, as far as i understood there is quite some code behind that. I didn’t release the GIL for GetBounds(), it seems be quite cheap.

anthrotype commented 4 years ago

Releasing the GIL should obviously only be done for thread-safe code or for code that uses other means of protection against race conditions and concurrency issues.

I'm not familiar with the cpp code that pyclipper is wrapping, are we 100% sure that it is thread-safe?

Note that acquiring the GIL is a blocking thread-synchronising operation, and therefore potentially costly. It might not be worth releasing the GIL for minor calculations.

Ideally we would benchmark and decide on a case-by-case basis rather than adding nogil to everything that runs in C++. Could any of you provide such benchmarks for before/after the nogil?

flabowski commented 4 years ago

I think it is thread-safe. _to_clipper_path() and _from_clipper_path() are still called with the GIL aquired, only the C++ code would be without the GIL. In Python we dont have any access to the memory clipper uses, it should be local to it's thread. About the speed: I have no idea to what degree Python can take advantage of that, i assume that depends on what the user does:

pc = pyclipper.Pyclipper()
pc.AddPath(clip, pyclipper.PT_CLIP, True)
pc.AddPaths(subj, pyclipper.PT_SUBJECT, True)
solution = pc.Execute(pyclipper.CT_INTERSECTION, pyclipper.PFT_EVENODD, pyclipper.PFT_EVENODD)
something_unrelated1()
something_unrelated2()
something_unrelated2()

Right now this would be all sequential. In theory with the proposed changes line1 to line4 will still be sequential, but line6, line7 and line8 could potentially be executed in parallel to line2-4. I'll try to set up a test.

anthrotype commented 4 years ago

Fixed by #30