In Fedora Linux 38 systems (both aarch and x86_64), when running multiple parallel instances of the command:
oasdiff breaking openapi.orig.json openapi.json
Some of then (2 or 3 out of 50, in our experiments) return error 102 to the operating system and show the text:
Error: failed to load base spec from "file": open file: no such file or directory
Please, note that the error is points to the file named file to be missing. It does not mention the real name of the file openapi.orig.json as is normally happens when the file is actually missing.
If telemetry is disabled, this behavior is not observed. That is, the command below works as expected.
Run multiple instances of oasdiff with telemetry enabled. We have observed this either doing it from a single shell, or in CI systems where these multiple instances are started as a part of continuous integration jobs.
Expected behavior
We expected the command to not return that a file was missing, when in fact it was not.
Desktop (please complete the following information):
Linux Fedora 38 x86_64 and aarch
oasdiff version 1.10.8 and 1.10.11
Additional context
We have observed that this behavior seems to be caused by calling SendCommand in the preRun function. SendCommand seems to be visiting the cmd.Flags within a goroutine. Unfortunately, pflags.Visit seems not to be thread-safe and actually updates the internals of the data structure. While this happens, cobra is potentially accessing it to perform some validations such as ValidateRequiredFlags (see the code here, please). This is a potential candidate to explain the behavior and why the error is difficult to reproduce.
Describe the bug
In Fedora Linux 38 systems (both
aarch
andx86_64
), when running multiple parallel instances of the command:Some of then (2 or 3 out of 50, in our experiments) return error
102
to the operating system and show the text:Please, note that the error is points to the file named
file
to be missing. It does not mention the real name of the fileopenapi.orig.json
as is normally happens when the file is actually missing.If telemetry is disabled, this behavior is not observed. That is, the command below works as expected.
To Reproduce
oasdiff
with telemetry enabled. We have observed this either doing it from a single shell, or in CI systems where these multiple instances are started as a part of continuous integration jobs.Expected behavior
We expected the command to not return that a file was missing, when in fact it was not.
Desktop (please complete the following information):
x86_64
andaarch
oasdiff
version1.10.8
and1.10.11
Additional context
We have observed that this behavior seems to be caused by calling
SendCommand
in thepreRun
function.SendCommand
seems to be visiting thecmd.Flags
within a goroutine. Unfortunately,pflags.Visit
seems not to be thread-safe and actually updates the internals of the data structure. While this happens,cobra
is potentially accessing it to perform some validations such asValidateRequiredFlags
(see the code here, please). This is a potential candidate to explain the behavior and why the error is difficult to reproduce.