basler / pypylon

The official python wrapper for the pylon Camera Software Suite
http://www.baslerweb.com
BSD 3-Clause "New" or "Revised" License
566 stars 207 forks source link

Can't call GetInstance methods in other processes once they have been used in the main process #659

Open bennorodemann opened 1 year ago

bennorodemann commented 1 year ago

It seems like using any pylon.TlFactory.GetInstance() method works fine in other processes until any of those methods are used within the main process, after which all the methods cause the code to hang. Is there a way to get around this? Is there maybe a way to close the instance allowing for it to be called in other processes? Here is an example code:

from multiprocessing import Process
from pypylon import pylon
import time

def get_device_infos(id):
    print(f'processing {id} starting')
    pylon.TlFactory.GetInstance().EnumerateDevices()
    print(f'Processing {id} complete')

# calling EnumerateDevice (or any GetInstance method) is a process is ok
p1 = Process(target=get_device_infos, args=(1,))
p2 = Process(target=get_device_infos, args=(2,))
p1.start()
p2.start()

time.sleep(1)
print()

# Once EnumerateDevice (or any GetInstance method) is called in the 
# main process then it no longer works in other process
pylon.TlFactory.GetInstance().CreateFirstDevice()

p3 = Process(target=get_device_infos, args=(3,))
p3.start()
# Hangs here forever
thiesmoeller commented 1 year ago

Best practice is to run all acquisition in one process and only process the images in other processes. E.g. using a shared memory queue this works with zero copies.

To your specific request Getting the TlFactory in different processes works of course, but you have to prevent, that python starts pickling/unpickling pypylon objects, as they are only swig wrappers around c++ objects. Transferring these objects to other processes will fail due to some singleton structures uses for pylon internal recounting.

As this is not best practice, we've never debugged into your use case

jonahpearl commented 8 months ago

We've noticed a similar issue on Linux, wherein subprocesses started by forking show this behavior; starting the subprocs with spawning instead solves the issue.

On my Mac with an emulated camera, the issue manifests slightly differently, with the program crashing only when I try to start grabbing frames. Nonetheless, the problem appears the same — subprocesses that are forked cause this bug, while spawned ones don't.

Here is code to reproduce for me:

import os
import sys
from multiprocessing import Process, set_start_method

from pypylon import pylon

# SEE: https://github.com/basler/pypylon/issues/659

def test_func(id):
    print(f"processing {id} starting")

    di_filter = get_emulated_filter()
    tlFactory = pylon.TlFactory.GetInstance()
    devices = tlFactory.EnumerateDevices(di_filter)
    print(f"processing {id} got devices: {devices}")

    cam = pylon.InstantCamera(tlFactory.CreateDevice(devices[0]))
    print("Created camera")
    cam.Open()
    print("Opened camera")
    cam.StartGrabbing()
    print("Started grabbing")
    cam.StopGrabbing()
    cam.Close()
    print(f"Processing {id} complete")

def get_emulated_filter():
    """
    Returns a device filter that can be passed to
    pylon.TlFactory.GetInstance().EnumerateDevices().
    """
    device_class = "BaslerCamEmu"
    di = pylon.DeviceInfo()
    di.SetDeviceClass(device_class)
    return [di]

def main():
    # Allow emulated cam
    os.environ["PYLON_CAMEMU"] = str(1)

    # Works
    p1 = Process(target=test_func, args=(1,))
    p1.start()
    p1.join()

    print()

    # Once EnumerateDevice (or any GetInstance method) is called in the
    # main process then it no longer works in other process
    di_filter = get_emulated_filter()
    tlFactory = pylon.TlFactory.GetInstance()
    _ = tlFactory.EnumerateDevices(di_filter)  # this is the line that breaks the fork version

    # Try forcing deletion, doesn't help
    # del devices
    # del tlFactory
    # del di_filter
    # gc.collect()

    p3 = Process(target=test_func, args=(3,))
    p3.start()  # GH person said hangs here, but fine for me
    p3.join()

    return

if __name__ == "__main__":
    start_method = sys.argv[1] if len(sys.argv) > 1 else "spawn"
    set_start_method(start_method)
    main()

When I run with spawn I get:

processing 1 starting
processing 1 got devices: (<pypylon.pylon.DeviceInfo; proxy of <Swig Object of type 'Pylon::CDeviceInfo *' at 0x10eddd560> >,)
Created camera
Opened camera
Started grabbing
Processing 1 complete

processing 3 starting
processing 3 got devices: (<pypylon.pylon.DeviceInfo; proxy of <Swig Object of type 'Pylon::CDeviceInfo *' at 0x113425590> >,)
Created camera
Opened camera
Started grabbing
Processing 3 complete

but with fork I get:

processing 1 starting
processing 1 got devices: (<pypylon.pylon.DeviceInfo; proxy of <Swig Object of type 'Pylon::CDeviceInfo *' at 0x10ef06e20> >,)
Created camera
Opened camera
Started grabbing
Processing 1 complete

processing 3 starting
processing 3 got devices: (<pypylon.pylon.DeviceInfo; proxy of <Swig Object of type 'Pylon::CDeviceInfo *' at 0x10ef06ee0> >,)
Created camera
Opened camera
[and then no more output, the program crashes with no error message]

This implies that there is some Python object maintained in the background after calling tlFactory.EnumerateDevices(di_filter) in the main process, which is inherited by forked subprocesses but (obviously, by construction) not by spawned ones. As noted in the example, even calling del on the relevant vars and then using gc.collect() didn't seem to help in the forked case.

This isn't a major issue since we can just force all our subprocesses to spawn instead of forking. But it seems like there ought to be a way to close / release the tlFactory such that it's returned to a pristine state. Is there such a method? I tried a few of the tlFactory methods (destroy device, etc) but couldn't figure out how to use them to do this.

Thanks!

thiesmoeller commented 8 months ago

To fully unload TL see

https://github.com/basler/pypylon/blob/1c8636841df6eecb0592eb33d4caf1d6a6360268/samples/load_unload_transportlayer.py

jonahpearl commented 8 months ago

Thank you, this looks like what I need, but it hasn't worked yet. I changed the relevant code in the example to:

# Once EnumerateDevice (or any GetInstance method) is called in the
# main process then it no longer works in other process
di_filter = get_emulated_filter()
tlFactory = pylon.TlFactory.GetInstance()
tl = tlFactory.CreateTl("BaslerUsb")
devices = tlFactory.EnumerateDevices(di_filter)  # this is the line that breaks the fork version

# Pylon-approved method to release the transport layer
# tl.DestroyDevice(devices[0].DetachDevice())
tlFactory.ReleaseTl(tl)  # unload the transport layer
del devices
del tl
del tlFactory
gc.collect()

but the same problem (crashing at camera open) persists when using fork. I can't use tlf.CreateTl("BaslerGTC/Basler/CXP") as given in the example, it throws an error about invalid tl specification, and anyways, I'll be using USB cameras, so seems right to test with that.

Strangely, if I then comment out the tlFactory.EnumerateDevices(di_filter) and del devices bit, I get an odd looking error:

The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
[this repeats ~8 times]

Neither of these errors occur if I use spawn to start the subprocesses.

thiesmoeller commented 8 months ago

Good explanation here: https://britishgeologicalsurvey.github.io/science/python-forking-vs-spawn/#:~:text=Forking%20and%20spawning%20are%20two,they%20were%20in%20the%20parent.

Spawn will import Pypylon in the subprocess which will initialize relevant Singletons via Pylon initialize internally

bennorodemann commented 8 months ago

Sorry, I actually found a solution a little while ago but forgot to put my solution here. What ended up working for me is just creating/destroying everything manually as described in pypylon/samples/load_unload_transportlayer.py. To get a list of available transport layers you can use pylon.TlFactory.GetInstance().EnumerateTls().