Open praveenkuttappan opened 1 year ago
As we look into this, we should revisit if the JSON output model is the one we want to keep having, or if we should consider something different. From @jonathanserbent : "Other places we could benefit from a more descriptive internal representation of a review include better mapping between languages for a single library, better context and mapping between revisions (e.g. branching revisions, etc), better metadata around API (e.g. input, output, and input/output models), better search and filter capabilities at the API level, etc, etc, etc."
This could enable other feature requests like:
I'll add another thing I would like to see considered as part of this effort: the ability to easily assign CSS style classes to the output, so that we may easily customises the view without the need for adding new token types.
Issue https://github.com/Azure/azure-sdk-tools/issues/7375 would be solved by the work in this epic
More requests .
Would also be nice - though lower pri, admittedly - if client methods could have not only a "token attribute" that says it's a method, but also a variant (separate token attribute?) that ignores the prefix so we can more easily compare members across languages. For example, most languages use list
as a prefix for pageables or enumerables in general, while .NET uses get
. Those will sort very differently. Ignoring the prefix (and maybe that's the option: "[ ] Ignore method prefix"?), we can better compare languages because listResources
and GetResources
will sort as just "Resources".
Again, I agree this is low pri, but as you think about the storage mechanic for how tokens and their "attributes" / metadata are collected and stored, perhaps consider whether we can attach different info or multiple instances (like .NET attributes) so it's at least possible/easier later.
Would also be nice - though lower pri, admittedly - if client methods could have not only a "token attribute" that says it's a method, but also a variant (separate token attribute?) that ignores the prefix so we can more easily compare members across languages. For example, most languages use
list
as a prefix for pageables or enumerables in general, while .NET usesget
. Those will sort very differently. Ignoring the prefix (and maybe that's the option: "[ ] Ignore method prefix"?), we can better compare languages becauselistResources
andGetResources
will sort as just "Resources".Again, I agree this is low pri, but as you think about the storage mechanic for how tokens and their "attributes" / metadata are collected and stored, perhaps consider whether we can attach different info or multiple instances (like .NET attributes) so it's at least possible/easier later.
Is the essential effect of this that you want the members sorted by the actual names rather than the prefix?
Basically, yeah. Or, rather, by their names sans the prefix. So listResources
, list_resources
, GetResources
, etc., would all lexicographically sort together as, say, "resources" - maybe not by default, but just when comparing languages' consistency.
QQ: Is there updated documentation on the schema for these changes?
@LarryOsterman I've been working on the Java implementation, by following this doc here
I am going to start working on improving the docs as best as we can, and Dozie will keep adding details to them
We are targeting to deploy to Production the first version of this changes. It will include the .NET parser.
After the deployment, these are issues we need to work on:
APIView rely on the JSON token file and currently it is an array of tokens where a token is created by language parser for each literal, definition, type, annotation, link, warning, method signature, character, new line and spaces. One of the main disadvantages of this approach is that APIView does not know where is the boundary of a specific context or which line is the parent of a specific line( which class an API belongs to). This approach also makes token file really large because we need to include token for each type names, spaces, new lines etc which are mainly included to tell APIView how to show a review in UI. Also diffing cannot identify the context of a change. Diffing is purely based on text comparison which is time taking instead of a tree shaker algorithm if tokens are more tree based.
Benefits of using this new hierarchical token format:
In the following proposed solution, language parser needs to create token file in more structured hierarchical format and token file won’t have any information related to how it’s presented in APIView. High level tree structure will be as follows
High level tree structure
All these token will have some common properties( like definition ID, diagnostic warnings, cross language id, annotations etc) and some of the token specific properties are as follows.
Implementation: