eMoflon / benchmarx

Infrastructure for implementing benchmarx: benchmarks for bidirectional transformation (bx) tools. Also contains a collection of example benchmarx and test runners for various and diverse bx tools.
GNU General Public License v3.0
9 stars 12 forks source link

Export changes as eMoflon change sequence? #35

Closed georghinkel closed 7 years ago

georghinkel commented 7 years ago

The paper for the TTC case states that benchmarx can be used with non-JVM tools and quotes the reference implementation for BiGUL as a proof. However, this implementation seems to work state-based. I would like to implement the case in a non-JVM tool that is delta-based. Therefore, would it be possible to export the changes easily in the eMoflon internal change format? From there, I can transform the changes into my own change format and that might possibly work (creating a co-program and communicate via stdin/stdout).

Further, who is measuring time for performance tests? Is that done by benchmarx? In that case, please support a feature that allows non-JVM tools to override this, otherwise the times are not worth anything, as mostly serialization and deserialization will be measured.

anthonyanjorin commented 7 years ago

Hi Georg,

good points! Here are our suggestions:

  1. I suggest you implement BXTool<FamilyRegister, PersonRegister, Decisions> and leverage EMF notifications to build up your own delta structure that will be passed via stdin/stdout to your tool. I think this should be easier than working with eMoflon's (internal) delta structure and I wouldn't want to introduce such a dependency to eMoflon for your tool. Another point here is that eMoflon's deltas are structural not operational, meaning that the order in which things happen is ignored. If you want to pass all tests then you should respect the order of edits in your own listener.
  2. With a reasonably implemented serialisation (so not EMF-based right!), we believe that the time measurement should give a good approximation of the tools scalability/efficiency even if a certain communication overhead is involved for non-JVM tools. Allowing each tool to override how time is measured is also a bit problematic as we would have no control over what is included and what not. Our suggestion is that you use the provided measurement infrastructure as is. If you feel there is a significant and thus unfair overhead for your tool then please additionally perform your own measurement in your tool and add this curve in the final plot so one can see how large this overhead is. In general we are interested in two plots: One for propagating changes of increasing size always from scratch, and one for propagating the same "small" change for models of increasing size. If your tool scales then hopefully the communication overhead will become negligible.

Does this make sense?

Cheers, Tony

georghinkel commented 7 years ago

Hi Tony,

  1. I am not really confident with the EMF notification API, but is it possible in EMF to obtain the original URI of a model element after that URI has changed, for example due to a move or an insertion of some other element in an ancestor? Further, I am not asking to include a general export functionality, just asking to export the changes relevant for the TTC case.

  2. Model serialization and deserialization is not trivial and there is a reason that EMF (de-)serialization is expensive. The NMF deserialization may be slightly faster in some cases, butI also experienced that especially the fact that the NMF deserialization is case-insensitive often makes it slower than EMF deserialization, which is case-insensitive (I will have to insert a switch to allow case insensitivity as well). Moreover, regarding your assumption on the influence of serialization times as communication overhead to the overall performance measurements, I think the opposite is true: Serializing a change can be O(n) where n is the model size (for example, if index-based URIs are used), (de-)serialization of an entire model is O(n), if not worse. Meanwhile, change propagation is hopefully in O(d) where d is the size of the delta, which usually is constant with respect to the model size. Therefore, for a good solution, performance measurements tend to measure only communication overhead and no longer the actual change propagation.

Cheers,

Georg

anthonyanjorin commented 7 years ago

The EMF notification API is fairly straightforward and has a MOVE event (which eMoflon by the way ignores, so another point for not using that here). To export "all the changes relevant" for the TTC case I would write a few lines of code to handle these events and create and pass on some arbitrary data structure. What would be the added value of this? (I'll do it if you insist).

Our experience with BiGUL showed that serialisation was just a minor fraction of the time required, but yes, this is of course framework-specific. As suggested please use the provided framework and additionally add your own measurements conducted in your tool to indicate the overhead.

georghinkel commented 7 years ago

OK, I think I got that, but from what I can tell so far, the serialization overhead is going to be large.