esa-tu-darmstadt / tapasco

The Task Parallel System Composer (TaPaSCo)
GNU Lesser General Public License v3.0
104 stars 24 forks source link

[libtapasco] Exclusive Access and DMA Buffer allocation wear off after one failed attempt #296

Open zyno42 opened 2 years ago

zyno42 commented 2 years ago

When I'm having one Host application running with exclusive access (in this case tapasco-debug in Debug Mode) and then start another runtime application which tries to acquire exclusive access to the same device this results in the following errors in the first two attempts but then from the third attempt on it succeeds.

$ cargo run --
    Finished dev [unoptimized + debuginfo] target(s) in 0.09s
     Running `target/debug/tapasco_runtime`
An error occurred: Failed to initialize TLKM object: Could not create device: DMA Error: Could not allocate DMA buffer EMFILE: Too many open files
$ cargo run --
    Finished dev [unoptimized + debuginfo] target(s) in 0.08s
     Running `target/debug/tapasco_runtime`
An error occurred: Failed to decode TLKM device: Could not acquire desired mode TlkmAccessExclusive for device 0: EBUSY: Device or resource busy
$ cargo run --
    Finished dev [unoptimized + debuginfo] target(s) in 0.08s
     Running `target/debug/tapasco_runtime`

The first error comes from libtapasco allocating all 32 DMA Buffers from TLKM at initialization time which also happens only once.

jahofmann commented 2 years ago

Could you run the driver in debug mode and post the corresponding dmesg output?

zyno42 commented 2 years ago

Sure. This is the corresponding log:

dmesg_tapasco_issue_296.log

jahofmann commented 2 years ago

It might be enough to add a new DMAControl implementation in https://github.com/esa-tu-darmstadt/tapasco/blob/master/runtime/libtapasco/src/dma.rs that simply does nothing and use that by at https://github.com/esa-tu-darmstadt/tapasco/blob/8f77c7ccb99214d6f1c1b3560eeb16bbe199e53c/runtime/libtapasco/src/device.rs#L356 and https://github.com/esa-tu-darmstadt/tapasco/blob/8f77c7ccb99214d6f1c1b3560eeb16bbe199e53c/runtime/libtapasco/src/device.rs#L389 and https://github.com/esa-tu-darmstadt/tapasco/blob/8f77c7ccb99214d6f1c1b3560eeb16bbe199e53c/runtime/libtapasco/src/device.rs#L400

Lastly, the correct DMA engine has to be loaded and unloaded in https://github.com/esa-tu-darmstadt/tapasco/blob/8f77c7ccb99214d6f1c1b3560eeb16bbe199e53c/runtime/libtapasco/src/device.rs#L484

Otherwise the DMA engine is initialized even for monitor only applications.

zyno42 commented 2 years ago

Thank you for your suggestions. I've seen through them and I think I haven't stated the problem clearly enough:

The problem is that if a device is exclusively acquired, another application receives the correct EBUSY error only once.

zyno42 commented 2 years ago

I've implemented your suggestions. However, this produces another error message when tapasco-debug runs in monitor mode and another host application runs in exclusive mode: Failed to initialize TLKM object: Could not create device: Scheduler Error: PE Error: Error during interrupt handling: Could not register eventfd with driver: EFAULT: Bad address

As in the previous implementation this error wears off after one retry.

jahofmann commented 2 years ago

I fear this is a similar problem. When the PEs are created, the runtime will also allocate and register eventfd for interrupt handling. This step needs to be postponed until the access mode is exclusive as monitoring apps should not receive the interrupts.

There are also some guards needed around the wait for PE functions to avoid deadlocks if the interrupts have not been set.

This should make the runtime play fine with the driver, but all these checks should also be in the driver so it does not simply crash if "held wrong". In any case this is more for future reference than your work ;)

cahz commented 1 year ago

Might be fixed in #328?