areaDetector / ADPylon

An EPICS areaDetector driver for cameras from Basler using their Pylon SDK.
https://areadetector.github.io/areaDetector/ADPylon/ADPylon.html
Other
1 stars 3 forks source link

IOC receives camera connection failed occasionally #4

Open AbdallaDalleh opened 11 months ago

AbdallaDalleh commented 11 months ago

We have the following setup:

Camera Model: Basler acA1300 Pylon SDK 7.3 Docker-based IOC (with the minimal alpine Linux)

The IOC every couple of days receives the following error: basler-error

This is exactly the same error I get from the Pylon Viewer when connecting to a camera that is being controlled somewhere else. The problem is that it happens every few days, to resolve it I have to stop and start the acquisition. It starts working and after few days I get the same error.

xiaoqiangwang commented 11 months ago

It would be helpful to attach the full log in text.

AbdallaDalleh commented 11 months ago

Here is a sample log output from the IOC shell:

2023/12/10 13:18:54.400 ADPylon::connectCamera error opening camera 23186682: Failed to open 'Basler acA1300-30gm#003053309FFA#10.2.4.70:3956'. The device is controlled by another application. Err: An attempt was made to access an address location which is currently/momentary not accessible. (0xE1018006)

2023/12/10 13:18:54.400 ADPylon:connect:  camera connection failed (3)
2023/12/10 13:18:54.532 SRC16-DI-PNHL:CAM:PoolUsedMem devAsynFloat64::reportQueueRequestStatus queueRequest error port pinhole not connected
2023/12/10 13:18:54.532 SRC16-DI-PNHL:CAM:PoolAllocBuffers devAsynInt32::reportQueueRequestStatus queueRequest error port pinhole not connected
2023/12/10 13:18:54.532 SRC16-DI-PNHL:CAM:PoolFreeBuffers devAsynInt32::reportQueueRequestStatus queueRequest error port pinhole not connected
2023/12/10 13:19:14.763 PylonFeature::initialize error input feature type=6 != Pylon feature type=0 for featurename=DeviceSerialNumber
2023/12/11 00:50:16.640 ADPylon::connectCamera error opening camera 23186682: Failed to open 'Basler acA1300-30gm#003053309FFA#10.2.4.70:3956'. The device is controlled by another application. Err: An attempt was made to access an address location which is currently/momentary not accessible. (0xE1018006)

2023/12/11 00:50:16.640 ADPylon:connect:  camera connection failed (3)
2023/12/11 00:50:36.999 PylonFeature::initialize error input feature type=6 != Pylon feature type=0 for featurename=DeviceSerialNumber
2023/12/11 02:10:26.165 ADPylon::connectCamera error opening camera 23186682: Failed to open 'Basler acA1300-30gm#003053309FFA#10.2.4.70:3956'. The device is controlled by another application. Err: An attempt was made to access an address location which is currently/momentary not accessible. (0xE1018006)

2023/12/11 02:10:26.165 ADPylon:connect:  camera connection failed (3)
2023/12/11 02:10:46.524 PylonFeature::initialize error input feature type=6 != Pylon feature type=0 for featurename=DeviceSerialNumber
xiaoqiangwang commented 11 months ago

For the message timestamp, it looks like the docker container network has intermittent interrupt.

AbdallaDalleh commented 11 months ago

Thanks for the feedback, the issue started to appear around the time we migrated the IOC to docker, for testing purposes we switched back to the standard IOC setup along with the entire path MTU set to 9000. I'll provide you with feedback soon.

AbdallaDalleh commented 11 months ago

The issue happened again outside of docker, we ran a basic IOC with ADPylon integrated, it was running fine until few hours ago I got the same error, notice that it did not resume so I had to stop/start acquiring. Could there be some parameters in the OS or in the features GUI that I need to modify?

xiaoqiangwang commented 11 months ago

During the operation, anything unusual related to buffer/packets in the status section?

Screenshot 2023-12-13 at 08 26 43
AbdallaDalleh commented 11 months ago

I haven't checked the status section recently but I don't remember getting any failed buffers or packets.

AbdallaDalleh commented 11 months ago

I just remembered that we are running the ADPylon with Pylon SDK 7.3, the tools in 7.3 do not run under rocky linux 8 because they require a higher version of the libstdc++, could this be an issue for the IOC?

xiaoqiangwang commented 11 months ago

Pylon SDK 7.3 does not run on RHEL/Centos 7 because of libc and libstdc++ versions.

For Rocky Linux 8, there is only this crash-on-exit issue https://github.com/areaDetector/ADPylon/issues/1, but it has been fixed.

AbdallaDalleh commented 11 months ago

Actually Pylon SDK 7.3 does not work with rocky Linux 8 due to libstdc++ version, the only latest functional SDK on Rocky Linux 8 is 7.2.1, SDK 7.2.1 has been working fine few months ago. Yesterday, we installed the IOC with SDK 7.2.1 on a laptop on the same switch as the camera, with the MTU on all nodes set to 9000 and just this morning I got the same error. We are suspecting an issue with the camera itself, we will try rebooting it through PoE and test again. What do you think?

xiaoqiangwang commented 11 months ago

I am able to build and run ADPylon IOC using Pylon SDK 7.3 on RHEL8.9 and Rocky Linux8.7. What is not working is the pylonviewer client program, requiring libc>=2.29.

If you could run two cameras on the same host, that would be a definite proof. But I suspect, as much as you do, that the camera is failing.

AbdallaDalleh commented 11 months ago

We added a 2nd camera on the same switch and failed with the same error, we are suspecting with the PoE on the switch, we will setup an external power while turning off PoE for both cameras and test again.

AbdallaDalleh commented 11 months ago

With all cameras set to the same acquisition settings, I got the same error on a test camera connected directly on the same switch with MTU set to 9000 and with external power supplies, I also got the same error on a different camera running on a different switch with MTU set to 1500 but after like 700K frames. I am thinking of two things:

What do you think?

xiaoqiangwang commented 11 months ago

So far all tests involve network switches, would it be possible to test with a direct connection between PC and camera? See Peer-to-Peer Network Architecture and Changing the Network Adapter Properties (Linux) in https://docs.baslerweb.com/network-configuration-(gige-cameras)

AbdallaDalleh commented 11 months ago

Peer-to-peer was tested multiple times but with the Pylon Viewer, one time it acquired 7M+ frames but can't recall any failed frames if any.

xiaoqiangwang commented 11 months ago

For comparison, it would still be good

In both ways, one would identify whether the network switch is part of the problem.

AbdallaDalleh commented 11 months ago

Not all parameters mentioned in the network configuration page are supported in RL8 because of a mix between kernel version and NIC driver support for these parameters. Looks like we might need an RL9-based setup ....

xiaoqiangwang commented 11 months ago

Not all parameters mentioned in the network configuration page are supported in RL8 because of a mix between kernel version and NIC driver support for these parameters. Looks like we might need an RL9-based setup ....

What error messages do you observe?

AbdallaDalleh commented 11 months ago

The ethtool command reports the following errors on different PCs:

netlink error: cannot modify an unsupported parameter (offset xx)
netlink error: Invalid arguments

Where the offset is just the location of the parameter in the command. I searched over the internet, it seems that ethtool command have a list of parameters but the support depends on the NIC model and Linux driver, even this list of parameter in the tool itself varies from kernel version to another version. Looks like we might need to try an RL-9-based setup.

xiaoqiangwang commented 11 months ago

We have used NICs with Intel and Broadcom chipsets and RHEL has good support of them.

AbdallaDalleh commented 10 months ago

We are still investigating the issue on our side, but I have a question, the driver won't acquire anything if the MTU is set to 9000 on the camera through features GUI, this will only happen if it is connected to any switch. In the case of P2P, the MTU is working fine.

xiaoqiangwang commented 10 months ago

this will only happen if it is connected to any switch

Is the network switch configured with Jumbo Frame?

AbdallaDalleh commented 8 months ago

Hi Wang, sorry I forgot about this issue but as we agreed it seems it is more of a networking issue, we moved the camera's Ethernet cable to another switch and we get much better performance, millions of frames captured continuously on 10 FPS before any disconnection, for the time being this is acceptable and we can deal with it because of the busy operation, we will test few cameras soon on a 10G switch supporting MTU up to 10000 I think and provide you with feedback. Thanks!.