(De)Serialization triggers alignment fault on RTEMS/powerpc

mdavidsaver commented 6 years ago

Till Straumann reports https://bugs.launchpad.net/epics-base/+bug/1754787

We experienced a (reproduceable) crash of pvData on RTEMS/powerpc.

I tracked it down to the ByteBuffer (de-)serialization code which
does IMHO ugly things:

class ByteBuffer {
  char *position;

  ...

  template <typename T>
  inline void put(T value)
  {
    ...
    *((T*)position) = value;
  }
};

PLEASE: don't code like this.

The template was instantiated for a 'double' value and -- since this is a 'byte buffer' -- it is obvious that the 'position' pointer does not meet any alignment constraints.

The PowerPC (and possibly other machines, too) cannot store floating-point registers to unaligned addresses and the type cast invites the compiler to generate inappropriate code (claiming that 'position' is a valid 'double*' which in fact is not necessarily true).

During compilation I had also seen several 'breaking strict alias rule' warnings flying by (albeit not for this particular code, of course) -- but after seeing the above code I would recommend to study such warnings carefully.

mdavidsaver commented 6 years ago

This has been confirmed as an issue on RTEMS and vxWorks targets with hard-float. The fault is in testStructure() of testSerialization.cpp.

# Testing structure...
ok 160 - factory.get()!=NULL
#       Simple structure serialization
... crash

mdavidsaver commented 6 years ago

Candidate fix a51b308cc87e10cbe77c8057d3b96130697fa7a5

anjohnson commented 6 years ago

CA has long used a set of inline C++ templates defined in libCom's osiWireFormat.h and osdWireFormat.h to do unaligned data copying and byte swapping, which Jeff wrote to be optimizable by older compilers. Other than their obscure names is there any reason we couldn't use these instead of rewriting yet more of libCom inside pvDataCPP? I guess the absence of 64-bit integer support is a good one (I have a fix for that which I can commit to 3.15 or 3.16) and I know you didn't write the original code in byteBuffer.h, but it would be nice to be able to reduce this kind of duplication in the future.

msekoranja commented 6 years ago

Please consider initial version (before refactorisation) of byteBuffer.h that handled unaligned/aligned access. I remember testing it on different platforms and comparing performance using different methods. For Intels (x86, x64, etc) it is the best not to handle alignment by yourself, CPU does it for you. For older CPUs (ppc, etc.) you must.

In addition use memcpy for large blocks of data to allow vectorization.

mdavidsaver commented 6 years ago

@msekoranja FYI, it depends on your target. It worked fine on RTEMS targets with soft-float (eg. mvme3100). Also, my refactoring wasn't a functional change due to:

#define UNALIGNED_ACCESS true

cf. c762d94f7af1aed0fc49ef113adb5ecadcdf8467

mdavidsaver commented 6 years ago

@anjohnson I had forgotten about osiWireFormat.h. I'll look at it.

anjohnson commented 6 years ago

See branch wire-format-64 on https://git.launchpad.net/~anj/epics-base/+git/base-3.16 for the Int64 additions to os*WireFormat.h.

mdavidsaver commented 6 years ago

wrt. osiWireFormat.h the first thing I notice is that these ops combines unaligned access with byte order swap. In CA world this is always host to/from big endian (a la. hton*()). PVA needs conditional byte order swapping.

anjohnson commented 6 years ago

That would be a pretty good reason it can't be used then — sorry!

mdavidsaver commented 6 years ago

np. It was an interesting reminder of how nice it is that we only have to worry about 2's complement integers and ieee floating point.

mdavidsaver commented 6 years ago

oh, I also discovered 'class comBuf' (src/ca/client/comBuf.h) which seems like the conceptual equivalent of ByteBuffer, with a dose of codec.h thrown in as well. So much of libca only makes sense to me after I see/write another solution to the same problem.

anjohnson commented 6 years ago

FYI I'm seeing a new compiler warning from byteBuffer.h that probably came from this commit:

../../src/misc/pv/byteBuffer.h: In instantiation of ‘void epics::pvData::ByteBuffer::getArray(T*, std::size_t) [with T = double; std::size_t = long unsigned int]’:
../../src/factory/PVDataCreateFactory.cpp:257:9:   required from ‘void epics::pvData::PVValueArray<T>::deserialize(epics::pvData::ByteBuffer*, epics::pvData::DeserializableControl*) [with T = double]’
../../src/factory/PVDataCreateFactory.cpp:660:1:   required from here
../../src/misc/pv/byteBuffer.h:825:18: warning: unused variable ‘start’ [-Wunused-variable]
         const T* start = (T*)_position;
                  ^

I just found another useful page on our Jenkins, I got this link from the compiler warnings page for pvData; if you visit it you'll see the line with the warning in orange, and hovering over it shows the warning message.

mdavidsaver commented 6 years ago

This warning is innocuous, and also fixed by b4cd026fe5c361dbf15b71cfab87b4dac98528c8

mdavidsaver commented 6 years ago

The candidate fix https://github.com/epics-base/pvDataCPP/commit/a51b308cc87e10cbe77c8057d3b96130697fa7a5 did resolve this issue.

mdavidsaver commented 5 years ago

Released with Base 7.0.2

epics-base / pvDataCPP

(De)Serialization triggers alignment fault on RTEMS/powerpc #54