antlr / antlr5

BSD 3-Clause "New" or "Revised" License
57 stars 5 forks source link

ANTLR v5

Java 21+ License

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build parse trees and also generates a listener interface (or visitor) that makes it easy to respond to the recognition of phrases of interest.

This is a new version of ANTLR, in an early development stage. If you are looking for a production ready version look into ANTLR v4.

Dev branch build status

MacOSX, Windows, Linux (github actions)

v5 vs v4

ANTLR 4 supports 10 target languages, and each of them requires a dedicated full runtime. With the advent of WebAssembly, there is an opportunity to have just 1 runtime, that will run faster with language hosts such as JavaScript or Python. ANTLR 5 is primarily about that: switching to WebAssembly. On top of that will come various improvements, currently being specified.

WebAssembly is still being bleeding edge, and the 1st version of ANTLR5 will only support TypeScript. As other platforms provide support for recent WebAssembly features, such as garbage collection and exception handling, ANTLR5 will rapidly become available for these platforms.

Repo branch structure

The default branch for this repo is main, which is the latest stable release and has tags for the various releases. Branch dev is where development occurs between releases and all pull requests should be derived from that branch. The dev branch is merged back into main to cut a release and the release state is tagged (e.g., with 5.1-rc1 or 5.1.) Visually our process looks roughly like this:

Authors and major contributors

We're only providing here the list of ANTLR 5 contributors. ANTLR 5 is largely based on ANTLR 4. See ANTLR 4 for the list of ANTLR 4 contributors, and they are recognized as silent ANTLR 5 authors.

Useful information

You might also find the following pages useful, particularly if you want to mess around with the various target languages.

The Definitive ANTLR 4 Reference

Given the fact that work on ANTLR 5 is at a very early stage, there is currently no material about ANTLR 5. However ANTLR 5 is based on the amazing work done in ANTLR 4, and it follows many of the ideas introduced by ANTLR 4. For this reason it makes sense to study the existing material on ANTLR 4.

Programmers run into parsing problems all the time. Whether it’s a data format like JSON, a network protocol like SMTP, a server configuration file for Apache, a PostScript/PDF file, or a simple spreadsheet macro language—ANTLR v4 and this book will demystify the process. ANTLR v4 has been rewritten from scratch to make it easier than ever to build parsers and the language applications built on top. This completely rewritten new edition of the bestselling Definitive ANTLR Reference shows you how to take advantage of these new features.

You can buy the book The Definitive ANTLR 4 Reference at amazon or an electronic version at the publisher's site.

You will find the Book source code useful.

Additional grammars

As of now, there is no collection of grammars for ANTLR 5, but we plan to grow such collection in grammars-v5, which is currently empty.

Until we get grammars for ANTLR 5, you can take a look at this repository; it is a collection of grammars verified for ANTLR 4 where the root directory name is the all-lowercase name of the language parsed by the grammar. For example, java, cpp, csharp, c, etc...