epics-modules / autosave

APS BCDA synApps module: autosave
https://epics-modules.github.io/autosave/
Other
8 stars 31 forks source link

Serious bug on master branch #48

Closed MarkRivers closed 1 year ago

MarkRivers commented 2 years ago

This change between the latest tagged version (R5-10-2) and the master branch is a major problem. It is discussed in this tech-talk thread: https://epics.anl.gov/tech-talk/2022/msg00639.php

                } else if (pchannel->curr_elements <= 1) {
                        /* treat as scalar */
                        if (pchannel->enum_val >= 0) {
                                n = fprintf(out_fd, "%d\n",pchannel->enum_val);
                        } else {
-                               n = fprintf(out_fd, "%-s\n", pchannel->value);
+                               n = epicsStrPrintEscaped(out_fd, pchannel->value, strlen(pchannel->value));
+                               if (n > 0) {
+                                       n = fprintf(out_fd, "\n");
+                               }
                        }
                } else {
                        /* treat as array */
                        n = SR_write_array_data(out_fd, pchannel->name, (void *)pchannel->pArray, pchannel->curr_elements);
                }
                if (n <= 0) {
                        printf("save_restore:write_it: fprintf returned %d [%s].\n", n, datetime);
                        if (errno) myPrintErrno("write_it", __FILE__, __LINE__);
                        problem |= FPRINTF_FAILED;

Note that previously it used to fprintf including a newline, so it would always return n>0 as long as there was no error.

Now n is set by epicsStrPrintEscaped which might return 0, and only if n>0 does it print a newline which will set n=1. This seems wrong for 2 reasons:

The commit with this error was done on November 14, 2019.
https://github.com/epics-modules/autosave/commit/7fee24c91a37bc3a69420d6bb17732c839ba0cf8

However, it is not present in the R5-10-1 tag (April 17, 2020) or the R5-10-2 tag (October 4, 2020), because that PR was not merged until March 19, 2022.

I just reverted to R5-10-2 and the autosave issue is fixed, my files are fine and the error messages no longer occur.

MarkRivers commented 2 years ago

Note that were additional changes in that commit which are likely also incorrect. The one above it just for scaler PVs.

anjohnson commented 2 years ago

This code came from PR #29, which was merged on March 9th 2022. When fixing/reverting that it would probably be a good idea to add a test which would fail and hence detect this.

MarkRivers commented 2 years ago

It seems to me that the fix could be to just remove the conditional here:

                        } else {
                               n = epicsStrPrintEscaped(out_fd, pchannel->value, strlen(pchannel->value));
                               if (n > 0) {
                                       n = fprintf(out_fd, "\n");
                               }

and change it to:

                        } else {
                               n = epicsStrPrintEscaped(out_fd, pchannel->value, strlen(pchannel->value));
                               n = fprintf(out_fd, "\n");

That will add the newline that is always needed, and it will guarantee that n=1 for the test below unless there is an error.

smarsching commented 1 year ago

As I am the person who authored PR #29, I wondered why we are not seeing this bug, even though we have been using a patch that is equivalent to this PR for a while now.

As it turns out, our patch is slightly different from the one in the PR. Instead of

                       n = epicsStrPrintEscaped(out_fd, value_string, strlen(value_string));
                       if (n > 0) {
                               n = fprintf(out_fd, "\n");
                       }

we have

            n = epicsStrPrintEscaped(out_fd, value_string, strlen(value_string));
            if (n > 0 || !strlen(value_string)) {
                n = fprintf(out_fd, "\n");
            }

and instead of

                               n = epicsStrPrintEscaped(out_fd, pchannel->value, strlen(pchannel->value));
                               if (n > 0) {
                                       n = fprintf(out_fd, "\n");
                               }

we have

                n = epicsStrPrintEscaped(out_fd, pchannel->value, strlen(pchannel->value));
                if (n > 0 || !strlen(pchannel->value)) {
                    n = fprintf(out_fd, "\n");
                }

I do not know why the version that I submitted in the PR differs from the version that we use internally, but I am pretty sure that this difference accounts for why the bug did not appear in our environment.

@MarkRivers The workaround that you suggest has an undesirable side effects: Problems in epicsStrPrintEscaped will not be detected any longer, because the return value n is overwritten in the next line. This is the reason why I added this conditional for the fprintf call in the first place.

I think that the version in our internal patch handles the situation correctly, because it will print the newline (and overwrite n) if and only if epicsStrPrintEscaped returns a positive value or returns a value of zero without indicating an error (this is when the string passed to epicsStrPrintEscaped has a length of zero).

@anjohnson If you like, I can submit a PR that makes this simple change. I agree that adding a test for this kind of issue makes sense, but I am not familiar with the test infrastructure for the autosave module at all, so I would need some help with this.

anjohnson commented 1 year ago

@smarsching It sounds like your production version should have both your and @MarkRivers issues ironed out then, so a PR with that adjustment in it would be great. It appears that there are no automated tests in this module at all, so my suggestion to add a test for the problem wasn't really practical (unless you want to contribute some tests, but that could be a lot of work).

smarsching commented 1 year ago

@anjohnson Thanks for your quick reply. I created PR #52 that provides the version of the code that we have been successfuly using internally for more than 18 months, so I am pretty confident that this will address both this issue but avoid ignoring the case where epicsStringPrintEscaped erroneously does not write anything.

On a side note, I think that epicsStrPrintEscaped itself handles errors imperfectly, because it might return a positive value when some of the calls to fprintf fail. A better implementation would check the return value of each call to fprintf and immediately return if that return value was negative. However, I guess that this is a discussion that we should have over in the EPICS Base project, so I created https://github.com/epics-base/epics-base/issues/309 for this.

MarkRivers commented 1 year ago

Closed via #52