VHDL-LS / rust_hdl

Other
329 stars 64 forks source link

Public API for traversing the design hierarchy / AST #219

Closed kraigher closed 1 month ago

kraigher commented 10 months ago

We need to create a stable public API to traverse the design hierarchy and the AST. I am hoping @Schottkyc137 and @skaupper can contribute their ideas and feedback here. My goal is to replace the Search trait and the Visitor trait with this implementation. First we need to gather some requirements than we can discuss the implementation.

Performance requirements

Schottkyc137 commented 10 months ago

Some preliminary thoughts from myself without getting too much into details:

Are any of these out of scope? Did I miss anything? What do you think?

kraigher commented 10 months ago

Regarding scope I am a believer in adding only what is currently needed. We can add more stuff in the future when there is a use case that can drive the design. I find one rarely gets the API right if you add everything up front.

For now I think the goal should be to have a nice API without duplication of Search/Visitor that supports completions, lsp and document generation as the initial scope.

skaupper commented 10 months ago

For navigation I want to add, that getting parents declarations from AST elements is important as well. Which may already be possible with AnyEnt (I have not looked into it too deeply).

Even if it is not immediately required, it would probably be good to ensure that AST elements are traversed in the order they appear in the source code, as that would probably be rather hard to rework after the fact. The current implementation does seem to do that as well.

I may encounter further things when I come around to play with the current implementation.

kraigher commented 10 months ago

For navigation I want to add, that getting parents declarations from AST elements is important as well. Which may already be possible with AnyEnt (I have not looked into it too deeply).

AnyEnt has a parent field. So this is already supported.

Even if it is not immediately required, it would probably be good to ensure that AST elements are traversed in the order they appear in the source code, as that would probably be rather hard to rework after the fact. The current implementation does seem to do that as well.

Yes this has always been a goal so that sorting on srcpos is not necessary. I add it as an explicit requirement above.

kraigher commented 9 months ago

There are two main things I have in mind for the API:

  1. I think the API should offer a simple way to iterate over libraries, design units, statements and declarations. I think the client code needs some control over when the nesting happens when iterating recursively. The reason is I think it is likely that a client code would like to have some context where they keep track of where they are in the AST and this is not easy with a recursive visitor pattern as you have no control over the nesting of calls.

  2. I think we do not need a general public visitor API to visit every small intermediate node in the AST below the statement/declaration level. For example why would we need a visitor function for a qualified expression? What is a client supposed to do with a visiting a qualified expression out of context? I think it is enough to have a way to find all references within an AST subtree and a way to go from SrcPos to a reference.

With these two things in mind I think we could create a type like this:

enum NodeRef<'a> {
   Declaration(&'a ast::Declaration),
   Sequential(&'a ast::Sequential),,
   Concurrent(&'a ast::Concurrent),
}

trait<'a> NodeVisitor<'a>  {
  // Only iterates over immediate children, not recursively
  fn for_each_child(&self, fun: FnMut(NodeRef<'a>));
  // An alternative is to use impl Iterator<Item = NodeRef<'a>> but I think the type inferred by impl Iterator would be complex to create without Box<dyn Iterator> however maybe the heap allocation is not so bad as @Schottkyc137 benchmarked.
}

The API on the top level would be the Root which I think could be created as a thin facade over the internal DesignRoot


impl <'a> Root<'a>  {
   fn libraries(&self) -> impl Iterator<item = Library<'a>> {}
   fn get_library(&self, sym: &Symbol) -> Option<Library<'a>> {}
}

impl <'a> Library<'a>  {
   fn primary_design_units(&self) -> impl Iterator<item = DesignUnit<'a>> {}
   fn get_primary_unit(&self, sym: &Symbol) -> Option<DesignUnit<'a>> {}
}

impl <'a> DesignUnit<'a>  {
   fn secondary_design_units(&self) -> impl Iterator<item = DesignUnit<'a>> {}
   fn get_secondary_unit(&self, sym: &Symbol) -> Option<DesignUnit<'a>> {}
   fn context_clause(&self) -> ContextClause<&'a> {}
}

impl<'a> NodeVisitor<'a> for DesignUnit {}
skaupper commented 9 months ago

What I really like about the current visitor implementation is that simple use cases require simple code. The use case Get all procedure declarations can be achieved by only implementing the visit_subprogram_declaration method of the Visitor trait. But I agree with you that more context/control over how the AST is traversed is probably needed.

Although I do not know how easy it is to use custom iterators, they would allow the user a great amount of control, if needed.

I could imagine uses like these:

let design_root = ...;
// Get all primary units from the design
let primary_units_iter = design_root.get_primary_units(); // impl Iterator<item = DesignUnit>
// Get all declarations from the design
let declarations_iter = design_root.get_declarations(); // impl Iterator<item = Declaration>

By using iterators where possible the user would have access to iterator functions like .map(...) and .filter(...).

The relevant traits could look like this:

trait PrimaryUnitVisitor {
    fn get_primary_units(&self) -> impl Iterator<item = PrimaryUnit>;
}

trait DeclarationVisitor {
    fn get_declarations(&self) -> impl Iterator<item = Declaration>;
}

By implementing them for the right types (DesignRoot, PrimaryRoot and even Iterator<item=PrimaryUnit>) I imagine that even the following might be possible:

// Get all declarations from all primary units (even .filter(...) might be possible between get_primary_units and get_declarations)
let declarations_iter = design_root.get_primary_units().get_declarations(); // impl Iterator<item = Declaration>

Please bear in mind that there may be some roadblocks I did not consider (because of lack of experience), but I think an iterator based API (if possible) would be quite versatile and powerful.

kraigher commented 9 months ago

@skaupper A context free vistor API could always be built as an utility on top of the API I proposed. I do not think we need a visitor function for every litte thing from day one though. I would rather have the use cases drive the design when they occur. Day one is enough with an API for the current use cases.

Regarding methods returning impl Iterator in traits I think it is not possible yet. Maybe works now with generic associated types, have to check. You have to use Box<dyn Iterator> otherwise

skaupper commented 9 months ago

Day one is enough with an API for the current use cases.

That's fine for me. I wasn't proposing a visitor API as it is now, just wanted to emphasize its simplicity in cases where context is not needed. For the initial design it should be sufficient to iterate declarations and statements.

Regarding methods returning impl Iterator in traits I think it is not possible yet

Too bad that is not possible as of right now.

I do like your API suggestion, though. Equivalent to primary_design_units, secondary_design_units etc. shouldn't the following be also possible?

impl <'a> DesignUnit<'a>  {
    fn children(&self) -> impl Iterator<item = NodeRef<'a>> {}
}

If possible, I would prefer it to write design_unit.children().for_each(...) instead of design_unit.for_each_child(...) because the user would still have access to all other Iterator functions (like map and filter) without being that much more verbose. What do you think?

kraigher commented 9 months ago

Maybe the right trade-off in this case is to sacrifice a little bit of performance and use a boxed dyn Iterator:

fn children(&self) -> Box<dyn Iterator<Item=NodeRef<'a>>>

It is at least better than using a Vec I think.

kraigher commented 9 months ago

As it seems the overall design is ok I will create an implementation of my proposal and make an PR so we can discuss more details.

kraigher commented 9 months ago

I made a first draft here: https://github.com/VHDL-LS/rust_hdl/pull/228

kraigher commented 9 months ago

I added a benchmark that can be run with cargo bench on master so we can have something to compare with when we replace the Search trait with a new api.

kraigher commented 9 months ago

I am currently also working on a bottom-up refactor to bring the Searcher and the Visitor closer together. I already fixed that the Searcher does not need &mut on the AST. I use AtomicUsize to clear references instead.

kraigher commented 9 months ago

@Schottkyc137 I did some benchmarking on Search trait vs Visitor. You can se what I did on the benchmark branch.

visit entire ast       197.22 ms        153/153
search entire ast       25.59 ms    1,168/1,171

As you can see the Search trait is almost 8x faster. I will try to rewrite the Visitor to use an iterator instead of a vec and see if it helps.

Schottkyc137 commented 9 months ago

@Schottkyc137 I did some benchmarking on Search trait vs Visitor. You can se what I did on the benchmark branch.

visit entire ast       197.22 ms        153/153
search entire ast       25.59 ms    1,168/1,171

As you can see the Search trait is almost 8x faster. I will try to rewrite the Visitor to use an iterator instead of a vec and see if it helps.

Alright then it's time to ditch the visitor (at least in its current implementation) :D My initial idea was that it's somehow more performant in certain circumstances than the Searcher because it avoids recursion. But the benefit (if there even is any) is likely minuscule compared to the cost of dynamic allocation and also vtable lookup. Can help in any refactoring if there is anything you'd like me to do.

kraigher commented 9 months ago

Can help in any refactoring if there is anything you'd like me to do.

Not at the moment. I will try to modify the Search trait and the FoundDeclaration enum so that it can solve the current use in AutocompletionVisitor and then remove visitor.rs. I it will not be much work and think I can do that without help. After that I very much appreciate you input on the future public API. I would still like to offer some kind of iterator-based API like the one proposed here.

kraigher commented 9 months ago

@Schottkyc137 I added the stuff needed by completion.rs to the Search trait and used it instead there. Then I removed the Visitor trait so we only have one implementation.

skaupper commented 9 months ago

@kraigher Regarding feedback: I hope that I find some time this week to throw together an example to get a feel for the API.

From what I've already tried, it seems a bit problematic if a whole Project is needed when all I want is just to iterate the AST. I will do a more detailed write-up as soon as I got to it.

kraigher commented 9 months ago

@skaupper My idea would be to have a facade for the design hierarchy named Root that would allow iteration over libraries, design units and their declarations and statements. You would still need a Project to analyze the design but you would be able to get a &Root out from the Project. In this way we could move all the AST queries such as find_all_refernces from Project to Root.

skaupper commented 9 months ago

So, I played around with it and looked mostly at two things:

  1. How to get the Project/Root/DesignRoot instance?
  2. How would I like to iterate through libraries and design units.

For 1. I tried to be as concise as possible (If I have a single VHDL file, how do I get the AST for it?), which resulted in the following code:

let config = Config::read_file_path(Path::new("../rust_hdl/vhdl_libraries/vhdl_ls.toml")).unwrap();

let mut project = Project::from_config(config, &mut NullMessages);
project.update_source(&Source::from_latin1_file(Path::new("./vhdl/package.vhd")).unwrap());
project.analyse();

let root = project.to_api_root();

What I do not like is, that analysing the project fails with a panic if std/standard.vhd is not part of the project. For small applications like docu extractors, code formatters etc. which do not necessarily parse whole libraries at once, needing to provide a copy of the standard library everytime you do something is at least annoying.


For 2. I played around with the iterator API.

let lib = root
    .libraries()
    .find(|lib| lib.name() == "work")
    .expect("Did not find library work.");
println!("Library: {}", lib.name());

for primary_unit in lib.primary_units() {
    println!(
        "  Primary: {}",
        match primary_unit.get() {
            AnyPrimaryUnit::Package(p) => p.ident.to_string(),
            AnyPrimaryUnit::Entity(e) => e.ident.to_string(),
            AnyPrimaryUnit::Context(c) => c.ident.to_string(),
            _ => "<not implemented>".to_string(),
        }
    );

    for secondary_unit in primary_unit.secondary_units() {
        println!(
            "    Secondary: {}",
            match secondary_unit.get() {
                AnySecondaryUnit::Architecture(a) => a.ident.to_string(),
                AnySecondaryUnit::PackageBody(p) => p.ident.to_string(),
            }
        );
    }
}

The simple for-loops work just fine, except that convenience functions like .name() are still missing, but with the wrappers around the AST elements, such functions could not be easier to implement!

What does not work is chaining iterator functions in cases like this:

let packages_in_work: Vec<_> = root
    .libraries()
    .filter(|lib| lib.name() == "work")
    .flat_map(Library::primary_units)  // This does not work!
    .filter(|pu| matches!(pu.get(), AnyPrimaryUnit::Package(_)))
    .collect();

// Do something with the filtered units
packages_in_work.iter().for_each(|p| println!("{}", p.name()));

Since Library::primary_units (as well as Primary::secondary_units and HasNodes::children) do not consume the value, these function chains cannot work.


The last example of 2. can be fixed with some rather small changes, like providing consuming variants of Primary::secondary_units etc. and additionally providing a variant of analysis::root::Library::secondary_units which does not take a &Symbol, but either a Symbol or even an usize (the unique symbol ID). Any thoughts on the (rather simple) examples, and the potential fixes?

For 1. I did not try to find a workaround/fix but it seems like some fields of DesignRoot are not populated without the standard library being parsed. I am not familiar with the analysis code. Would you see an elegant to avoid the panics while keeping DesignRoot as functional as it can be without the standard types etc.?

skaupper commented 9 months ago

On a different note: It seems to be a common pattern to expect &Symbols instead of strings as arguments to some query functions (like Root::get_library or Library::get_primary_unit).

What is your preferred way to obtain Symbols, without already having the respective library/unit at hand?

kraigher commented 9 months ago

What I do not like is, that analysing the project fails with a panic if std/standard.vhd is not part of the project. For small applications like docu extractors, code formatters etc. which do not necessarily parse whole libraries at once, needing to provide a copy of the standard library everytime you do something is at least annoying.

Doing analysis without the standard library is not feasible. Without analysis there would be no semantic information such as symbol references from use to declaration. What you want is just doing the parsing, that is still possible to do and you would get an AST without any references set. When only doing parsing there is not much use for the Project at and you might as well use the parser directly. This use case would thus be solved by adding the parser to the public API.

On a different note: It seems to be a common pattern to expect &Symbols instead of strings as arguments to some query functions (like Root::get_library or Library::get_primary_unit).

What is your preferred way to obtain Symbols, without already having the respective library/unit at hand?

Maybe the most user friendly is to just have public API functions take a &str. Otherwise we would need a Public API function to create and intern a symbol into the symbol table.

skaupper commented 9 months ago

What you want is just doing the parsing

You are right about that. Adding the parser to the API is definitely something I would want. Also I think it would not hurt to have a wrapper type (just like Root), so convenience functions, query functions etc. can be implemented, without littering the library internals.

That way we could have the API you proposed for the analyzed AST, while for use cases which do not require the analyse step a simpler and less involved API for the parsed AST would be available. Maybe the analyzed API could event build on the parsed API.

If you want to keep analyzed and parsed APIs, I would be happy to give the parsed API a go (but probably not before christmas), assuming that you mostly care about the analyzed API for the language server etc.

Maybe the most user friendly is to just have public API functions take a &str.

I agree that it's probably best if the users do not need to care about internals like Symbols and can instead use standard primitives where it is sufficient. The wrapper types in the API are perfect places for those kind of functions :)

Schottkyc137 commented 8 months ago

I have been quiet for some time now as I generally agree with the points that @skaupper raises. However, I want to raise one more point concerning scope that I have alluded to earlier.

While I agree with @kraigher that features should only be implemented if the desire arises, implementations should not forbid certain classes of features or make it unreasonable difficult to implement these features. One point in the current implementation that I see is that most resolves around one data structure – the AST. Subsequent passes (i.e., the analysis pass) only enrich that data structure with more information such as a reference to a declared entity. While this is enough for certain applications (analysis, completions, extracting documentation, ...) it might not be enough for future applications such as evaluation or compilation of a VHDL program. Most other compilers solve this with some intermediate representation after each pass. And while these applications might be far-fetched, two points should IMO still be considered: a) The claim is that vhdl_ls contains a general-purpose VHDL language front-end and b) Stuff like evaluation of a design might also be interesting for a language server. Maybe in the future, a design can be evaluated on-the fly and more elaborate errors can be published. Examples include checking for case statement overlap or range checking.

I'm saying all of this to still be careful not to discourage future applications, even if there is no immediate use-case.

skaupper commented 8 months ago

Subsequent passes (i.e., the analysis pass) only enrich that data structure with more information

You are right about that. Which makes me think that it would probably a good idea to make that enrichment more explicit.

Currently the Project struct does a lot of things, like

  1. Handling configs
  2. Gathering sources (which is mostly done via 1. atm)
  3. Parsing said sources
  4. Analysing the parsed sources
  5. Providing an API for working with the analysed VHDL.

Splitting these functionalities in three structs, there could be

  1. A SourceProject, which is unparsed and is only responsible for gathering source files (so, point 1. and 2. from above)
  2. A ParsedProject, which takes the gathered sources and parses them all (point 3. from above). This struct could also provide access for an API working with the raw AST.
  3. A AnalysedProject, which takes the ParsedProject and does the semantic analysis as it is now.

In the future, this structure could be further extended by an ElaboratedProject (or whatever) by consuming the AnalysedProject moving or transforming fields as needed and adding new information as well. That would be what you meant @Schottkyc137, right?

For reference, a small example of how this struct could interact with each other, and how a user would have to use them:

// Prepare sources
let config = Config::read_file_path(Path::new("../rust_hdl/vhdl_libraries/vhdl_ls.toml")).unwrap();
let source_file = Path::new("test.vhd");

// Create SourceProject
let mut source_project = SourceProject::from_config(config); // or SourceProject::new() for an empty project
source_project.add_file(&source_file);

// Create ParsedProject
let parsed_project = source_project.parse();
{
    let parsed_api = parsed_project.api();
    // Do something with the raw AST
}

// Create AnalysedProject
let analysed_project = parsed_project.analyse();
{
    let analysed_api = analysed_project.api();
    // Do something with the analysed AST
}

To conclude, I see multiple benefits by such a restructuring:

  1. Separation of concerns. Project would not be responsible anymore for basically everything.
  2. Clarity. That's kind of a byproduct of 1. If a struct has only one responsibility, it should be easier to grasp what you (as a user) are supposed to do with it.
  3. Flexibility. By choosing a layer of abstraction, which handles all needed use cases, the user does not need to care about constraints of higher layers (i.e. see this comment about the need for the standard library for analysis).
  4. Extendability. If someday someone decides that an ElaboratedProject is needed, extending the proposed structure should be doable as well.

Getting rid of some &mut self functions (i.e. by replacing Project::analyse(&mut self) with ParsedProject::analyse(self) -> AnalysedProject), while not 100% necessary, is also a plus in my book.