APIView - Create API review tokens in tree structure

praveenkuttappan commented 1 year ago

APIView rely on the JSON token file and currently it is an array of tokens where a token is created by language parser for each literal, definition, type, annotation, link, warning, method signature, character, new line and spaces. One of the main disadvantages of this approach is that APIView does not know where is the boundary of a specific context or which line is the parent of a specific line( which class an API belongs to). This approach also makes token file really large because we need to include token for each type names, spaces, new lines etc which are mainly included to tell APIView how to show a review in UI. Also diffing cannot identify the context of a change. Diffing is purely based on text comparison which is time taking instead of a tree shaker algorithm if tokens are more tree based.

Benefits of using this new hierarchical token format:

Diff can show context of the change better instead of hardcoded 5 lines above and below the changed line. Issue https://github.com/Azure/azure-sdk-tools/issues/3640
We can support a collapsible view at each class and namespace level to make it more easier to review the change. User will be able to expand/collapse an entire class or a namespace and it’s children.
Faster diffing using tree shaker instead of current text based comparison
Ability to granularly identify a specific class and it’s methods or a specific API alone within a large review without rendering entire tokens. This will be helpful for cross language view of a granular section within an API review.
Less data to be stored in token file which are located in azure storage blob.
Support user preferred rendering of API view at run time. For e.g. we have received a request from some users in the past they want to see each method param in new line where as other user prefers to see everything in one line.
Sort API review on APIView side. Currently we list APIs in the order it is added to token file by parser.
Support dynamic representation of APIs for e.g. Some users prefer to see all method arguments in same line and some users prefer to see them in multiple lines.

In the following proposed solution, language parser needs to create token file in more structured hierarchical format and token file won’t have any information related to how it’s presented in APIView. High level tree structure will be as follows

High level tree structure

APIViewRoot
|----NamespaceObject <namespace1>
|----Class <class1>
               |---- API <method11>
               |---- API <method12>
|----Class <class2>
               |---- API <method21>
               |---- API <method22>
|----NamespaceObject <namespace2>
|----Class <class3>
               |---- API <method31>
               |---- API <method32>
|----Class <class4>
               |---- API <method41>
               |---- API <method42>

All these token will have some common properties( like definition ID, diagnostic warnings, cross language id, annotations etc) and some of the token specific properties are as follows.

API token for e.g. will have a list of param token for the API and each param token will have additional properties, return type etc
Class token will have inheritance details if applicable

Implementation:

Change will be required in both parser and APIView side. APIView can support both older version as well as new version of token format side by side so individual language can be migrated at it’s own pace if they prefer to take the benefit of new related features. Token format version metadata will be used to differentiate how it needs to be processed.
Tree based diffing logic which utilizes simple tree shaker algorithm.
Currently language parser controls how an API review needs to be presented in APIView to keep it language dependent. We need to have default renderer for each type within APIView and individual language can override this rendering logic for a token type to provide language specific output text.
Include expand and collapse logic in UI rendering side.
Diff UI to collapse everything except the node that was changed/added/deleted.

### Required for other languages to onboard
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8915
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8916
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8517
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8790
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8914
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8789
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8788
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8703
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8404
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8702
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8917

### Languages work
- [ ] https://github.com/Azure/azure-sdk-tools/issues/7922
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8735
- [ ] https://github.com/Azure/azure-sdk-tools/issues/7944
- [ ] https://github.com/Azure/azure-sdk-tools/issues/7945
- [ ] https://github.com/Azure/azure-sdk-tools/issues/7946
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8734
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8738
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8794
- [ ] https://github.com/Azure/azure-sdk-for-rust/issues/1621
- [ ] https://github.com/Azure/azure-sdk-tools/issues/8736

maririos commented 1 year ago

As we look into this, we should revisit if the JSON output model is the one we want to keep having, or if we should consider something different. From @jonathanserbent : "Other places we could benefit from a more descriptive internal representation of a review include better mapping between languages for a single library, better context and mapping between revisions (e.g. branching revisions, etc), better metadata around API (e.g. input, output, and input/output models), better search and filter capabilities at the API level, etc, etc, etc."

heaths commented 1 year ago

This could enable other feature requests like:

1703
676 (have the view sort regardless of whether the emitter did or not)

JonathanGiles commented 1 year ago

I'll add another thing I would like to see considered as part of this effort: the ability to easily assign CSS style classes to the output, so that we may easily customises the view without the need for adding new token types.

maririos commented 9 months ago

Issue https://github.com/Azure/azure-sdk-tools/issues/7375 would be solved by the work in this epic

chidozieononiwu commented 8 months ago

More requests .

Ability to freeze the current context (namespace, class) at the top of the UI page, and also use breadcrumbs to show the current context.
Show usage in the top of each method which links to all the other places they are used on the page.

heaths commented 8 months ago

Would also be nice - though lower pri, admittedly - if client methods could have not only a "token attribute" that says it's a method, but also a variant (separate token attribute?) that ignores the prefix so we can more easily compare members across languages. For example, most languages use list as a prefix for pageables or enumerables in general, while .NET uses get. Those will sort very differently. Ignoring the prefix (and maybe that's the option: "[ ] Ignore method prefix"?), we can better compare languages because listResources and GetResources will sort as just "Resources".

Again, I agree this is low pri, but as you think about the storage mechanic for how tokens and their "attributes" / metadata are collected and stored, perhaps consider whether we can attach different info or multiple instances (like .NET attributes) so it's at least possible/easier later.

chidozieononiwu commented 8 months ago

Would also be nice - though lower pri, admittedly - if client methods could have not only a "token attribute" that says it's a method, but also a variant (separate token attribute?) that ignores the prefix so we can more easily compare members across languages. For example, most languages use list as a prefix for pageables or enumerables in general, while .NET uses get. Those will sort very differently. Ignoring the prefix (and maybe that's the option: "[ ] Ignore method prefix"?), we can better compare languages because listResources and GetResources will sort as just "Resources".

Again, I agree this is low pri, but as you think about the storage mechanic for how tokens and their "attributes" / metadata are collected and stored, perhaps consider whether we can attach different info or multiple instances (like .NET attributes) so it's at least possible/easier later.

Is the essential effect of this that you want the members sorted by the actual names rather than the prefix?

heaths commented 8 months ago

Basically, yeah. Or, rather, by their names sans the prefix. So listResources, list_resources, GetResources, etc., would all lexicographically sort together as, say, "resources" - maybe not by default, but just when comparing languages' consistency.

LarryOsterman commented 6 months ago

QQ: Is there updated documentation on the schema for these changes?

JonathanGiles commented 6 months ago

@LarryOsterman I've been working on the Java implementation, by following this doc here

maririos commented 5 months ago

I am going to start working on improving the docs as best as we can, and Dozie will keep adding details to them

maririos commented 5 months ago

We are targeting to deploy to Production the first version of this changes. It will include the .NET parser.

After the deployment, these are issues we need to work on:

Azure / azure-sdk-tools

APIView - Create API review tokens in tree structure #5897

Benefits of using this new hierarchical token format:

1703

676 (have the view sort regardless of whether the emitter did or not)