GumTreeDiff / gumtree

An awesome code differencing tool
https://github.com/GumTreeDiff/gumtree/wiki
GNU Lesser General Public License v3.0
926 stars 173 forks source link

API documentation missing #215

Open s0nata opened 3 years ago

s0nata commented 3 years ago

Hi,

I am using GTD release 2.1.2 in a maven project, and I am missing documentation for APIs. There some classes here and there with javadoc comments, but mostly I have to resort to reading source code and playing a guessing game, which is very time-consuming and extremely frustrating. Most of the classes have only the license comment at the beginning of the respective file.

Could you please add documentation at least for public APIs?

If there is any other source of documentation I could use in the meantime, could you please share it? It will be immensely helpful.

jrfaller commented 3 years ago

Hi!

Unfortunately there is only the javadoc and the wiki 😢

I will keep this issue as a remainder to finish the public API. Don't hesitate to comment if you want some part documented first!

Cheers.

s0nata commented 3 years ago

OK, thanks!

For me the priority would be the APIs used in the GumTree API - as those are literally the first ones you go to when trying to use GTD.

jrfaller commented 3 years ago

OK, understood. I put the help wanted label as the help of the community would be much appreciated on this one ;)

Also, if you want prioritization of some classes, feel free to order them in the comments!

s0nata commented 3 years ago

also, @jrfaller how up to date is Diagrams? I have a feeling it is still for 2.1.2 at the revision f042c4b?

jrfaller commented 3 years ago

No in fact it is highly inaccurate, a mix between 2.x and 3.x!

Removing this.

s0nata commented 3 years ago

So, regarding missing documentation: I have an impression that the new package distribution system (GitHub packages instead of MavenCentral), and how you prepare them, is influencing the availability of documentation in the IDE, please see the screenshots and explanation below.

I am using IntelliJ IDEA, and my project is using Maven for the build.

When I was using previous GTD release, version 2.1.2, the javadocs were available in the IDE:

gtd-core-2 1 2-docs

and rendered correctly:

GTD2

When I switched to the latest release, version 3.0.0-beta1, the IDE was not able to pull the documentation:

gtd-core-3 0 0-docs

and respectively no javadoc was available:

GTD3

@jrfaller any way the new packaging method can also provide javadocs and sources?

If I am missing something please let me know! I've spent my share of time on stackoverflow and IntelliJ support forums and it was fruitless so far :-/

jrfaller commented 3 years ago

Thanks for spotting this! I will investigate but it strange since I do not have the impression that I changed anything in the packaging :(

s0nata commented 3 years ago

The packaging is my best guess, to be honest, so really looking forward to hearing from you))) and I do feel awkward, opening an issue about missing documentation just to find out that the docs are technically there...

jrfaller commented 3 years ago

OK should be fixed in the new beta2 release ;)

Cheers.

s0nata commented 3 years ago

works! thank you very much!

but could you please keep this issue open for a bit longer? I'll be actively diving into GTD for a month, and would be happy to share spots where docs can be added.

s0nata commented 3 years ago

hi @jrfaller , I see some 34 tree matching algorithms implemented in GTD, but not all of them are documented and I am puzzled at whether I should care to use some specific one. Has there been any benchmarking studies, or are there maybe combinations like "use parser X and matcher Y and edit script generator Z if the traget programming language is L"? Also, what was the engineering need behind having so many matchers?

  1. AbstractBottomUpMatcher,
  2. AbstractSubtreeMatcher,
  3. ChangeDistillerBottomUpMatcher,
  4. ChangeDistillerLeavesMatcher,
  5. ChangeDistillerParallelLeavesMatcher,
  6. CliqueSubtreeMatcher,
  7. CompleteBottomUpMatcher,
  8. CompositeMatchers.ChangeDistiller,
  9. CompositeMatchers.ChangeDistillerTheta,
  10. CompositeMatchers.ClassicGumtree,
  11. CompositeMatchers.ClassicGumtreeTheta,
  12. CompositeMatchers.CompleteGumtreeMatcher,
  13. CompositeMatchers.CompositeMatcher,
  14. CompositeMatchers.RtedTheta,
  15. CompositeMatchers.SimpleGumtree,
  16. CompositeMatchers.SimpleIdGumtree,
  17. CompositeMatchers.SimpleIdGumtreeTheta,
  18. CompositeMatchers.Theta,
  19. CompositeMatchers.XyMatcher,
  20. CrossMoveMatcherThetaF,
  21. GreedyBottomUpMatcher,
  22. GreedySubtreeMatcher,
  23. HungarianSubtreeMatcher,
  24. IdenticalSubtreeMatcherThetaA,
  25. IdMatcher,
  26. InnerNodesMatcherThetaD,
  27. LcsMatcher,
  28. LcsOptMatcherThetaB,
  29. LeafMoveMatcherThetaE,
  30. RtedMatcher,
  31. SimpleBottomUpMatcher,
  32. UnmappedLeavesMatcherThetaC,
  33. XyBottomUpMatcher,
  34. ZsMatcher
jrfaller commented 3 years ago

Hi @s0nata !

Indeed this is a lot of matchers.

There have been some benchmarking in the original research papers where they were respectively defined, and we are working on more benchmarks. But we are nowhere close to the answer that X is better than Y in context Z, unfortunately this is a hard problem.

For the rationale, don't loose sight that GumTree is a research vehicule, and therefore part of the fun is to devise new ways of performing AST diffs 😄 that's why it is growing a lot in this dimension. However I agree that at one point it would be awesome to document in more details where do these matchers come from, and roughly how do they work.

Cheers.

s0nata commented 3 years ago

@jrfaller and is there anything that sheds light on Tree class implementation? For instance, what is the meaning of the URL in

default Tree getChild​(java.lang.String url)

Returns the child node at the given URL.

Parameters:
    url - the URL, such as 0.1.2 
jrfaller commented 3 years ago

Hi!

It means root , then child at pos 1 then child at pos 2.

s0nata commented 3 years ago

hi @jrfaller long time no see more questions about GTD :-D

so I am looking at the generateFrom() method of the TreeGenerator class and I am struggling to understand what is considered a valid input. Can we only get an AST from a file/string/stream that contains a whole class definition, or building an AST of a single method is possible?

I am working with Java sources and I know that JavaParser by itself can work with files/strings that contain only one single method, but here it does not seem to be the case, especially considering tests in TestJavaParserGenerator.java

It would be nice to clarify that in the class comment for the TreeGenerator.

And also, in that class comment you have a missing verb in line 33: An abstract class for tree generators that what? ASTs from a input stream.

jrfaller commented 3 years ago

Hi @s0nata!

Nice to have a new round of quality insurance of our documentation!

The generateFrom() is used to produce a GumTree AST from a concrete Tree Generator (TreeGenerator is an abstract clas, made to be extended). A valid input is either a File or a String, a Stream or a Reader. However, there are some constraints on the content of the File or String (and so on) but these constraints are specific to the actual TreeGenerator that will be used. For instance, JdtGenerator requires a Java input compatible with JDT. I hope my explanation is clear?

I will fix the doc you mentionned, thanks for spotting this!

s0nata commented 3 years ago

A valid input is either a File or a String, a Stream or a Reader. However, there are some constraints on the content of the File or String (and so on) but these constraints are specific to the actual TreeGenerator that will be used.

yes, this makes sense, and this is what kept me puzzled. When I build an AST as

Tree oneMethodTree = new JavaParserGenerator()
              .generateFrom().string(Files.readString(Path.of(fileWithOneMethod)))
              .getRoot();

for the source snippet like

public class Dummy { // L01

  /**
   * dummy dummy
   *
   * @param a
   */
  public void foo(int a) {
      System.out.println("hello");
  }
}                   // L11

if lines L01 and L11 are deleted or commented out I get the following exception:

Exception in thread "main" com.github.gumtreediff.gen.SyntaxException:
  com.github.javaparser.ParseProblemException: (line 8,col 3)
  Parse error. Found "void", expected one of  ";" "@" "class" "enum" "interface" "module" "open"

If they are present I get the expected AST.

I do not know if adding a note on the input restrictions for specific generators - in this case for the JavaParserGenerator would make sense though.

jrfaller commented 3 years ago

It seems that JavaParser throws a SyntaxError in case of a method definition without a class. However, in your previous message you said that JavaParser should handle this case. Could you point me to the documentation of JavaParser about that? Maybe it can be made possible if I provide some options to the parser.

I think that I won't have the time to document all valid input for all parsers, but I could put a link in the JavaDoc to the documentation of each parser, it would be an acceptable middle ground no?

Cheers.

jrfaller commented 3 years ago

Hi @s0nata, I read JavaParser's documentation, and it seems that parsing only a method, or a statement is possible. However, the parser requires one additionnal parameter that can be supplied because of the parser facade of GumTree. It wouldn't be impossible to implement this behavior though.

s0nata commented 3 years ago

hi @jrfaller (sorry for long response, got carried away with work)

I think that I won't have the time to document all valid input for all parsers, but I could put a link in the JavaDoc to the documentation of each parser, it would be an acceptable middle ground no?

that I think would be helpful!

Hi @s0nata, I read JavaParser's documentation, and it seems that parsing only a method, or a statement is possible. However, the parser requires one additionnal parameter that can be supplied because of the parser facade of GumTree. It wouldn't be impossible to implement this behavior though.

but at the moment GTD is reading on per-class basis still, right?

s0nata commented 3 years ago

In the meantime, I have another documentation-related question: I am trying to figure out if there is a way to pretty-print a tree form AST representation into human-readable source - is it possible with GTD?

I have found this old stackoverflow discussion, but it is for the previous GTD version and not actually helpful.

I have also found the com.github,gumtreediff.io package, but the docs are scarce.

I understand it is not the main purpose of GTD as a software, but still, it offers tree manipulation.

Do you know of any example of pretty-printing that goes beyond TreeIoUtils.toLisp(tc).toString() and alike?

jrfaller commented 3 years ago

Hi @s0nata!

Yes we have several serialization format for trees and diffs. The more basic one is the toTreeString() method. No particular documentation on that unfortuntately.

What kind of doc would you like to have?

Cheers.