ADLINK-IST / opensplice

This is the Vortex OpenSplice Community Edition source repository. For our commercial offering see
https://www.adlinktech.com/en/vortex-opensplice-data-distribution-service
Apache License 2.0
259 stars 155 forks source link

EINVAL when creating threads on Fedora #95

Open cottsay opened 5 years ago

cottsay commented 5 years ago

When using OpenSplice on Fedora Linux with ROS, some nodes terminate citing EINVAL from pthread_create in posix/code/os_thread.c.

I reproduced this issue with Fedora 28, 29, and 30, in combinations with ROS Crystal and Dashing.

I noticed that the requested stack size was unusually small (64k) when the failure occurred. linux/code/os_thread_attr.c states that the OS should be increasing the stack as necessary, but this doesn't appear to be happening. Setting -DOSPL_ENV_PURIFY worked around the issue for me.

Do you have any guidance for investigating this further? I'd prefer not to set an extra flag that I don't fully understand.

PatrickM-ZS commented 5 years ago

I've looked into this but i'm afraid i'll need a little more context to determine what's going on. Can you provide a stack trace or a description of how to reproduce the issue? Did you build OpenSplice yourself or download a prebuilt installer (32- or 64-bit?). Is there any notable difference between nodes that terminate with this issue and those that don't? Can you also please check your user limits for sufficient number of processes/threads and stack-size (ulimit -a -H).

The minimum stack-size on Linux is 16KiB (PTHREAD_STACK_MIN), at least on regular 'desktop' distro's i've seen including Fedora (I imagine an embedded distro or non-glibc pthreads implementation may have other defaults). Either way, the default for OpenSplice threads is set to 64KiB which should be fine but if it's not, EINVAL should be returned by pthread_attr_setstacksize not pthread_create. If it gets to pthread_create, the issue might not be related to stack-size, but one of the other attributes. However if you set OSPL_ENV_PURIFY the default stack-size is raised to 10MiB so if that works for you, it must clearly be related to stack-size in some way I don't yet comprehend ;-).