apache / lucenenet

Apache Lucene.NET
https://lucenenet.apache.org/
Apache License 2.0
2.24k stars 639 forks source link

Tool to compare public API surface with Lucene #1022

Open paulirwin opened 1 week ago

paulirwin commented 1 week ago

Is there an existing issue for this?

Task description

We should create a tool that compares the public API surface of Lucene (at a specified version) to Lucene.NET. See https://github.com/apache/lucenenet/pull/1018#discussion_r1842137526 for context.

ChatGPT suggested comparing the metadata generated by Docfx and javadoc. Another alternative might be creating a Java tool to export JSON or XML of the public API surface via reflection, and then create a .NET tool that compares that via .NET reflection to Lucene.NET's assemblies.

This will require mapping Java naming conventions to .NET, amongst other challenges. We'd likely need the ability to create a manual mapping/exclusions file to handle discrepancies. But this will help us confirm the public API of Lucene.NET matches Lucene, as well as aid future porting efforts.

paulirwin commented 6 days ago

I wanted to provide an update on this. I've been experimenting with creating a Java tool called lucene-api-extractor, which will live in the Lucene.NET repo, that downloads the specified Lucene jars you wish to extract, loads them and reflects over them, and outputs the API surface as JSON. Then, I've got a new .NET console app (Lucene.Net.ApiCheck) that calls this tool, and loads in the JSON. So far all of that is working. Next up, ApiCheck will load and reflect over the matching .NET assemblies and compare the API surface to what it loaded from Java. This will of course be the hardest part.

My current thinking is once the diff between Lucene and Lucene.NET is generated, it will support saving this diff as JSON for programmatic/tool analysis, as well as generating an HTML report from this JSON.

There will be a config file to handle known mapping discrepancies, such as Int32Field vs IntField, along with a justification for each, and with enough massaging this config file will effectively represent the known differences between Lucene and Lucene.NET, and should be checked into git to evolve and be versioned alongside the code. Many discrepancies we can handle via convention, such as starting interfaces with "I," capitalizing method names, IDisposable vs ICloseable, etc. It will be interesting to see how the early results look once I get the diff logic working.

NightOwl888 commented 6 days ago

Thanks for putting this tool together. I am glad to see you picking up the torch and running with it. It will definitely help with the long-term maintenance of the project. We will need an arsenal of automation and this is a good addition to our war chest.