binaryeq / jcompile

scripts to compile Java projects with different compilers to create a data set of comparable binaries
Apache License 2.0
0 stars 0 forks source link

Add `revapi` oracle columns to NEQ1 #71

Closed wtwhite closed 11 months ago

wtwhite commented 11 months ago

Idea: Rather than eagerly discard rows from NEQ1 for which revapi reports no breaking change, initially let's just add extra columns describing the revapi results. This allows end-users to decide which rows they are interested in using.

Thoughts @jensdietrich?

wtwhite commented 11 months ago

Generating the .tsv summaries of the JSON revapi outputs doesn't take long:

wtwhite@wtwhite-vuw-vm:~/code/jcompile/runs/31_run_30_with_test_jars_stripped$ time make MINSEVERITY=POTENTIALLY_BREAKING -f ../../Makefile.revapi
/home/wtwhite/code/jcompile/summarise-revapi-json-to-tsv.sh < jars/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11__vs__commons-codec-1.12.revapi.POTENTIALLY_BREAKING.json > jars/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.11__vs__commons-codec-1.12.revapi.POTENTIALLY_BREAKING.tsv
/home/wtwhite/code/jcompile/summarise-revapi-json-to-tsv.sh < jars/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.12__vs__commons-codec-1.13.revapi.POTENTIALLY_BREAKING.json > jars/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.12__vs__commons-codec-1.13.revapi.POTENTIALLY_BREAKING.tsv
/home/wtwhite/code/jcompile/summarise-revapi-json-to-tsv.sh < jars/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.13__vs__commons-codec-1.14.revapi.POTENTIALLY_BREAKING.json > jars/ecj-3.11.1.v20150902-1521_openjdk-11.0.19/commons-codec-1.13__vs__commons-codec-1.14.revapi.POTENTIALLY_BREAKING.tsv
--snip--
real    0m41.686s
user    0m38.286s
sys 0m3.070s
wtwhite@wtwhite-vuw-vm:~/code/jcompile/runs/31_run_30_with_test_jars_stripped$ find . -name '*.revapi.POTENTIALLY_BREAKING.tsv'|wc -l
1414

They look reasonable. Lines can be duplicated many times, but this is not an issue:

wtwhite@wtwhite-vuw-vm:~/code/jcompile/runs/31_run_30_with_test_jars_stripped$ wc -l ./jars/openjdk-11.0.12/commons-configuration2-2.8.0-tests__vs__commons-configuration2-2.9.0-tests.revapi.POTENTIALLY_BREAKING.tsv
6273 ./jars/openjdk-11.0.12/commons-configuration2-2.8.0-tests__vs__commons-configuration2-2.9.0-tests.revapi.POTENTIALLY_BREAKING.tsv
wtwhite@wtwhite-vuw-vm:~/code/jcompile/runs/31_run_30_with_test_jars_stripped$ head !$
head ./jars/openjdk-11.0.12/commons-configuration2-2.8.0-tests__vs__commons-configuration2-2.9.0-tests.revapi.POTENTIALLY_BREAKING.tsv
org.apache.commons.configuration2.AbstractConfiguration POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    -
org.apache.commons.configuration2.AbstractConfiguration POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    -
org.apache.commons.configuration2.BaseConfiguration POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    -
org.apache.commons.configuration2.BaseConfiguration POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    -
org.apache.commons.configuration2.BaseHierarchicalConfiguration POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    -
org.apache.commons.configuration2.BaseHierarchicalConfiguration POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    -
org.apache.commons.configuration2.CompositeConfiguration    POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    -
org.apache.commons.configuration2.CompositeConfiguration    POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    -
org.apache.commons.configuration2.Configuration POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    -
org.apache.commons.configuration2.Configuration POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    -
wtwhite@wtwhite-vuw-vm:~/code/jcompile/runs/31_run_30_with_test_jars_stripped$ uniq ./jars/openjdk-11.0.12/commons-configuration2-2.8.0-tests__vs__commons-configuration2-2.9.0-tests.revapi.POTENTIALLY_BREAKING.tsv|wc -l
257

Next step: Change PreprocessedJsonRevApiJarComparer to actually read these results.

jensdietrich commented 11 months ago

What does POTENTIALLY_BREAKING mean here ? AFAIK revapi reports look like this:

Old API: easycrud-1.0.0.jar
New API: easycrud-1.1.0.jar
old: field nz.ac.vuw.jenz.easycrud.PersistencyService.VERSION
new: field nz.ac.vuw.jenz.easycrud.PersistencyService.VERSION
java.field.constantValueChanged: Constant field changed value from '1.0.0' to '1.1.0'.
SEMANTIC: BREAKING, BINARY: NON_BREAKING, SOURCE: NON_BREAKING

So we could add columns as follows:

old_location (example: field nz.ac.vuw.jenz.easycrud.PersistencyService.VERSION) new_location (example: field nz.ac.vuw.jenz.easycrud.PersistencyService.VERSION) change (example: java.field.constantValueChanged: Constant field changed value from '1.0.0' to '1.1.0'.) SEMANTIC_COMPATIBLE (boolean -- BREAKING means no, otherwise yes) BINARY_COMPATIBLE (boolean -- BREAKING means no, otherwise yes) SOURCE_COMPATIBLE (boolean -- BREAKING means no, otherwise yes)

SEMANTIC does not appear by default in results, if absent, set SEMANTIC_COMPATIBLE to true.

wtwhite commented 11 months ago

What does POTENTIALLY_BREAKING mean here ?

@jensdietrich AFAICT that is just a (maybe new?) severity level that revapi gives to changes that are very unlikely to cause breakage, but potentially could. Their docs don't say much:

POTENTIALLY_BREAKING - the difference may break the API compatibility (of given type) under some specific circumstances

AFAIK revapi reports look like this:

I'm using their JSON report format, which is the same info but easier for parsing.

change (example: java.field.constantValueChanged: Constant field changed value from '1.0.0' to '1.1.0'.)

Question: We can get multiple results per class (even more than one per method) -- how to combine them? Maybe just (deterministically) choose a single representative one? Or skip this column altogether?

SEMANTIC_COMPATIBLE (boolean -- BREAKING means no, otherwise yes) BINARY_COMPATIBLE (boolean -- BREAKING means no, otherwise yes) SOURCE_COMPATIBLE (boolean -- BREAKING means no, otherwise yes)

SEMANTIC does not appear by default in results, if absent, set SEMANTIC_COMPATIBLE to true.

Sounds good, will do.

wtwhite commented 11 months ago

Ran on a small test dataset:

wtwhite@wtwhite-vuw-vm:~/code/jcompile/oracle-construction$ time java -cp target/jcompile.jar nz.ac.wgtn.shadedetector.jcompile.oracles.AdjacentVersionSameArtifactAndCompilerClassOracle fixed_small_jars_with_tests > fixed_small_jars_with_tests_AdjacentVersionSameArtifactAndCompiler_perpairrealdata.txt
analysing: fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.4.0.jar vs fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.4.1.jar
analysing: fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.4.1.jar vs fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.5.0.jar
analysing: fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.5.0.jar vs fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.6.0.jar
analysing: fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.6.0.jar vs fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.6.1.jar
analysing: fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.6.1.jar vs fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.7.0.jar
analysing: fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.4.0-tests.jar vs fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.4.1-tests.jar
analysing: fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.4.1-tests.jar vs fixed_small_jars_with_tests/openjdk-11.0.19/bcel-6.5.0-tests.jar
--snip--
real    0m53.633s
user    1m3.346s
sys 0m5.960s

Results look good, but are dominated by classes for which revapi reported no information:

wtwhite@wtwhite-vuw-vm:~/code/jcompile/oracle-construction$ cut -f27- < fixed_small_jars_with_tests_AdjacentVersionSameArtifactAndCompiler_perpairrealdata.txt | sort|uniq -c
  18570 -   -   -
    386 BREAKING    BREAKING    -
      3 BREAKING    BREAKING    EQUIVALENT
    421 BREAKING    BREAKING    POTENTIALLY_BREAKING
    323 BREAKING    NON_BREAKING    -
    345 BREAKING    NON_BREAKING    POTENTIALLY_BREAKING
     13 BREAKING    POTENTIALLY_BREAKING    -
     18 BREAKING    POTENTIALLY_BREAKING    POTENTIALLY_BREAKING
      6 EQUIVALENT  EQUIVALENT  BREAKING
   1531 EQUIVALENT  EQUIVALENT  POTENTIALLY_BREAKING
     27 NON_BREAKING    BREAKING    -
     17 NON_BREAKING    NON_BREAKING    BREAKING
     11 NON_BREAKING    NON_BREAKING    POTENTIALLY_BREAKING
      6 POTENTIALLY_BREAKING    BREAKING    -
     28 POTENTIALLY_BREAKING    EQUIVALENT  -
      1 POTENTIALLY_BREAKING    EQUIVALENT  POTENTIALLY_BREAKING
     42 POTENTIALLY_BREAKING    NON_BREAKING    -
    141 POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    -
      2 POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    BREAKING
    100 POTENTIALLY_BREAKING    POTENTIALLY_BREAKING    POTENTIALLY_BREAKING
      1 source_compatibility    binary_compatibility    semantic_compatibility
wtwhite commented 11 months ago

Showing compatibility data as boolean BREAKING-or-not columns as suggested by @jensdietrich on the test dataset:

wtwhite@wtwhite-vuw-vm:~/code/jcompile/oracle-construction$ cut -f27- < fixed_small_jars_with_tests_AdjacentVersionSameArtifactAndCompiler_perpairrealdata_boolcompat.txt | sort|uniq -c
    810 false   false   true
    699 false   true    true
      1 source_compatible   binary_compatible   semantic_compatible
     33 true    false   true
     25 true    true    false
  20424 true    true    true