Execution time measurements

tsdh commented 9 years ago

The case description demands solution authors to put the execution times of all test cases as measured by ARTE into the solution descriptions. However, all test cases are very small toy examples. When I run execute --all the worst execution time I get for my solution is ~0.7 seconds for pub_pum3_1. However, when I execute just that using execute --test pub_pum3_1 it finishes in ~0.02 which is about the time needed for the other test cases, too. So that single longer execution time seems to be some GC hiccup which only happens when executing all tests in a sequence or something alike.

Well, long story short: I doubt the execution times measured for these toy examples is significant. Can't you provide some performance test cases, e.g., a base case with 10 classes with 5 pullable and 5 non-pullable members each, and then further cases where the base case is duplicated several times to get some larger models? (Like with the train benchmark case where the models double in size.)

Also, I think as soon as you use something like JaMoPP or Modisco, the execution times are probably dominated by parsing, reference resolution, and serialization, i.e., all the stuff which has actually nothing to do with the actual refactorings. So an execution sequence like parseJava -> transformToPG -> 1000 times: refactor -> serializeAsJava would be more meaningful with respect to execution time than the sequences containing at most two refactorings which ARTE does right now.

SvenPeldszus commented 9 years ago

We are aware of the lack of expressiveness of the time measurements for the given examples. The final test cases will as well in java program size as in amount of refactorings be larger.

tsdh commented 9 years ago

@SvenPeldszus Well, cool. But then solution authors cannot write something about the performance of their solution in their papers because they simply don't know if their execution times are good or bad (well, ok, if your execution times are slow for the ARTE tests they'll be worse for real-world examples, too).

@gkulcsar agreed in #37 that the soft aspects the case description mentions (including the up-to-now mandatory listing of execution times of the ARTE tests) should be rather hints than strict rules. So the performance discussion isn't mandatory anymore but still I guess many solution authors will like to have a word on that. Of course, anyone can learn ARTE to define larger tests or apply the transformation to real-world code-bases to measure the performance themselves. But then you can't compare results because everyone uses his own guinea pig.

gkulcsar commented 9 years ago

@tsdh It just came into my mind - if we do not require the execution times to be listed in the paper, do we want to still keep the 10 points awarded for speed in the ranking scheme?

tsdh commented 9 years ago

@gkulcsar I think yes. IIUC, @SvenPeldszus will run all solutions on some disclosed larger-scale test cases, and hopefully those will reveal meaningful execution times. I'd still much prefer if these tests were undisclosed so that reviewers can also execute them on SHARE to get an impression, and authors can write some statement about their solution's performance but of course that's up to you.

SvenPeldszus commented 9 years ago

Yes, I will run all solution and will provide the measurement results.

Additionally we will provide a new version of ARTE, which includes all the final test cases. I am not sure if it is possible to update ARTE on SHARE. In the worst case the tests can be easily executed outside of SHARE, to get an impression.

Of course the authors cannot write about the performance of their solution in the final test cases in this procedure. However, we prefer to stay with this procedure. I am sure every author can estimate the performance of his solution with out the execution of the final test cases, if he wants to write about this.

Echtzeitsysteme / java-refactoring-ttc

Execution time measurements #38