TerosTechnology / colibri

https://terostechnology.github.io/colibri
Other
18 stars 5 forks source link

Sphinx builder and autodoc plugin for HDL #244

Open GlenNicholls opened 2 years ago

GlenNicholls commented 2 years ago

@umarcor @LarsAsplund @JimLewis @eine @suzizecat @Paebbels @nfrancque I have been talking a bit with @qarlosalberto about potentially creating a sphinx builder/autodoc plugin for HDL, although I'm more interested in VHDL. I would really like the ability to automatically document VHDL code with autodoc and a VHDL language domain capability. TerosHDL already has autodoc capability for MD/HTML, but it is difficult to manage for large or distributed projects/repos. I am hoping that Sphinx solves this problem and it'd be really nice to eventually have a generic HDL extension in sphinx-contrib that anyone can use with pip and sphinx. Plus, I see a lot of potential for this to be very widely used as HDL documentation is an area where I have not seen a solution that meets most people's needs. I decided to ping y'all as I feel that you might find this valuable for your projects, you might have some good ideas, and I'm also hoping to get some feedback about how you think this should look. Feel free to ping others.

Anyways, if you feel that this discussion belongs on Gitter, the VHDL WG issues, or elsewhere, let me know and I can move it. I am planning to talk with management at my company about sponsoring this effort and Carlos was the first person I thought might be interested in taking this on based on Colibri. I previously attempted to go from Colibri to doxygen/asciidoctor. This proved to be a huge headache for distributed projects and is not going to be easy to maintain. I figured reST and Sphinx seemed like a natural choice as there is an interface to create extensions for other languages, builders, etc. The other reason I want to have a discussion about this is there are already quite a few VHDL autodoc plugins for Asciidoctor and others, but most are dead and I don't think any of them really found the right way to document VHDL or present it in the actual HTML/PDF/etc.

Documenting Source Code

Currently, Colibri and others tend to use some pattern in the comments to differentiate between a comment and documentation. I believe colibri can use any symbol, but it defaults to !, i.e. --!. I think this is a natural direction because VHDL doesn't have any notion of a docstring or similar and I know parsing VHDL is not easy. However, I'm not the biggest fan of that pattern because it's extra typing. If your editor/TerosHDL supports carrying on this comment and symbol with ENTER, then it's not that big of a deal. But, if people don't have that then it's annoying to write and update documentation. Using a different pattern in a multi-line comment makes a little bit more sense to me, but that is only valid for 2008+. I don't think this is viable though.

Instead, I think it should be based on the placement of the comment similar to Python. In python, you can add a docstring by placing the block comment directly underneath the declaration:

def some_func() -> None:
  """This is a docstring you can access with __doc__"""

For VHDL, I think it makes sense to put the documentation after things as well:

entity and_gate is
port (
  a : in std_logic; -- or it could go here
  -- a port documentation

  b : in std_logic;

  -- b port would have no documentation because comment doesn't immediately trail

  z : out std_logic
);
end entity;
-- Documentation for and_gate, ``reST`` is supported and can link to other modules :data:`~lib.pkg.SOME_CONST`.

Or even this maybe, adapting from google's style guide for python:

entity and_gate is
port (
  a : in std_logic;
  b : in std_logic;
  z : out std_logic
);
end entity;
-- Documentation for and_gate, ``reST`` is supported and can link to other modules :data:`~lib.pkg.SOME_CONST`.
--
-- Generics:
-- ...
-- Ports:
--     a: a's documentation
--     b:
--        b's documentation
--       
--        .. deprecated:: 1.0 b will be removed in 2.0. 

Block comments would also be allowed in the same place. I don't like that the documentation is below the entire entity declaration, maybe it makes sense to adopt Python's notation for classes where the docstring is after the class declaration or something like this maybe?

entity and_gate is
-- Documentation for and_gate, ``reST`` is supported and can link to other modules :data:`~lib.pkg.SOME_CONST`.
-- Generics:
-- ...
-- Ports:
--     a: a's documentation
--     b:
--        b's documentation
--       
--        .. deprecated:: 1.0 b will be removed in 2.0. 

port (
  a : in std_logic;
  b : in std_logic;

  z : out std_logic
);
end entity;

Personally I think right before the entity makes sense. However, a lot of people put comment headers in their code so for things like signals/constants/etc. this might jack up the documentation but that could also be something that just has to change if they want to use the tool.

One unknown here is how to provide documentation for a file. I.e. if you have a license at the top of your file, how does the parser know if there is something below it that should be documentation like https://github.com/VUnit/vunit/blob/master/vunit/__init__.py#L7-L9? I don't necessarily think that this is even needed though. People generally put one package/ent per file so it seems like something that doesn't really matter. Even in cases where someone puts multiple entities/packages in the same file, their documentation for the library already has everything so I can't think of any useful reason to have a global file documentation section.

Layout

Once the documentation is generated, I HATE the idea that tables are used for ports/generics. I don't think it's intuitive, I feel that tables make it so you focus too much on port attributes like the type or size instead of what the port is and how to use it. Instead, I think that it should just look like it does for python:

image

With VHDL, we care more about types/ranges and whatnot, so for this we should include that information in the parameters/generics list.

As for the documentation itself in the image above but for an entity? I think that should be it. All you really care about when instancing it is the API and how to use the component, you don't care about the implementation details and they should be considered private. However, sometimes this is critical to understand, so I think that within the documentation for and_gate, there should be a drop-down to expand the architecture (or architectures) for you to view the internal/private documentation like info about signals, FSM's, etc. Teros has some neat features and internal FSM's etc would be cool to reference in an entity's documentation so it shows up when someone scrolls past and_gate in the documentation. Another neat idea would be to display the block hierarchy, e.g. and_gate and how it connects sub-components. This would be super useful for project documentation where you can go into the top level's documentation and see exactly how everything is connected.

Now for subprograms, packages, etc. it'd just expand on the above ideas to determine a documentation convention along with how it should be displayed.

Annotations/Autodoc

So for autodoc with Python in Sphinx, you can do stuff like this for automagically inserting documentation in a reST document:

.. automodule:: noodle
   :members:

How would these automodule, autoclass,... look for VHDL/HDL? I don't like the idea that we would specify files directly as I will talk about how we can handle that later. I'm kind of thinking that we could use autolibrary which would populate all entitys, packages, architectures, etc in a library:

.. autolibrary:: some_lib
    :members: some_pkg, some_ent, some_ent.some_arch ... <- default if this isn't present is to include everything

As for other annotations, I think referencing other components, subprograms, etc. would use the library if it exists, otherwise the name itself if it is not in a library:

vhdl:ent:`~lib.pkg.ent_name`  <- ~ for only displaying ent_name in docs link
vhdl:func:`lib.pkg.some_function`  <- could be pure or impure, or maybe func and ifunc 
vhdl:proc:`lib.pkg.some_procedure`
vhdl:data:`ent.arch.some_signal`
vhdl:data:`ent.arch.SOME_CONST`
vhdl:type:`lib.pkg.some_type`  <- could be type or subtype, or maybe type and stype
...

Sphinx Building

I am thinking that there would only need to be an HDL extension as the included html builder should still work once it gets the correct information. However, the HDL extension would provide a flag to specify a configuration file. This file would be CSV (or TOML, YAML, or whatever makes sense) that would define all files in the project, their associated library, and anything else needed for building the documentation.

There's probably other stuff I haven't thought about, but this is the gist. Thoughts, suggestions, concerns?

umarcor commented 2 years ago

My very quick reaction (before going through the details of your message) is:

That's one of @Paebbels' main goals since years. As you can see in Paebbels/pyVHDLParser: Use Cases, he designed the parser the he did because he wanted to preserve all the original content and produce a Document Object Model (DOM) for documentation purposes. Long story short, in Dec 2020 - Jan 2021, he split vhdl/pyVHDLModel from pyVHDLParser, and then in joint effort with Tristan, they created pyGHDL.dom, which is pyVHDLModel + libghdl. In other other words, pyGHDL uses GHDL as a frontend (or backend) to fill the abstraction provided by pyVHDLModel. Note the "Consumers" in vhdl/pyVHDLModel: Use Cases.

NOTE: In order to be 100% usable for pyVHDLModel, some enhancements are required in libghdl (GHDL), because by default GHDL e.g. ignores all the comments (which are not needed for simulation or synthesis). Therefore, libghdl needs some "third elaboration mode" which allows to preserve all the details for documentation purposes. Another related enhancement is the resolution of all the symbols. That is all slow work in progress, because pyGHDL.dom is a very exciting use case of GHDL, but not the main one.

pyVHDLModel and pyGHDL.dom are usable since July 2021. A few months ago, I did some proofs of concept to showcase how they can be used. See Open Source Verification Bundle (OSVB): Project:

As the name suggests, OSVB: Documentation generation » Integration with Sphinx » Lists and tables shows Lists and Tables for entities, ports, generics, architectures, etc. generated automatically with Python, using pyGHDL.dom and tabulate. Furthermore, subsection VHDL Domain contains references to "proper" Sphinx extension projects:

I suggest to read Paebbels/sphinxcontrib-vhdldomain#4, which is the "latest" discussion with regard to the syntax and directives to use/create. Then, read/check CESNET/sphinx-vhdl. See also the last commit in https://github.com/umarcor/osvb/commits/sphinx-directives, which is something I drafted together with Patrick this summer, after chatting with @bradleyharden.

On a different but related topic, see subsection Diagrams. Apart from the references to SymbiFlow's sphinxcontrib-hdl-diagrams (using ghdl + yosys), recently Symbolator and pyHDLParser were forked to org hdl:

There is naturally some overlap between pyHDLParser, pyVHDLModel and pyVHDLParser. We would like to plug pyVHDLModel into Symbolator, in order to decouple it from the parser.

Last, but not least, although we are focused on VHDL, there is also pySVModel (see edaa-org/pySVModel#11).

/cc @LukasVik @mithro @proppy

JimLewis commented 2 years ago

@GlenNicholls I would be happier with --! than being stuck with some automation that tries to find the documentation based on some arbitrary metric. For example, some might like their entity/subprogram documentation before the entity/subprogram and not after.

umarcor commented 2 years ago

it'd be really nice to eventually have a generic HDL extension in sphinx-contrib that anyone can use with pip and sphinx.

Distribution is probably the second most complex challenge here. The first one is knowing/learning how to write proper Sphinx extensions, since the Sphinx codebase/architecture is not the most comfortable to work with. Then, since HDLs are such complex languages, it's not easy to have an extension that works through pip only. In practice, the extension will depend on some compiled/shared library. That is true for VHDL and also for System Verilog, because the most complete open source parsers are written in compiled languages in both cases.

Therefore, most projects trade functionality for distribution complexity. VUnit uses a regexp parser in Python, which is known to be limited, but it allows to not depend on anything other than Python. In fact, we discussed about replacing the parser with rust_hdl in the past, and the main reason not to do so was it being written in Rust, a compiled language. Similarly, TerosHDL uses tree-sitter mainly because it's written in JavaScript, and the current implementation is focused on trying to use as much JS as possible, in order to allow it being installed as "just a VSCode extension".

pyVHDLParser would be a very interesting solution for both VUnit and TerosHDL, because it would require Python only. TerosHDL is mainly written in JS, but it uses VUnit for relevant features, hence the dependency on Python is there already. Unfortunately, pyVHDLParser was started more than a decade later than GHDL, and it's not ready-to-use per se. That's why, a year ago, we decided to split pyVHDLModel from pyVHDLParser and plug it to GHDL (libghdl). We wanted to take the best (most complete) piece from inside pyVHDLParser and make it usable with the best open source VHDL parser/analyzer/elaborator (GHDL).

NOTE: all GHDL, rust_hdl and pyVHDLParser evolved naturally to a very similar internal architecture based on tokens and multiple phases. The main difference between them is the language and how much human-hours were devoted to the development.

The challenge is "how to install the parsers to extract data from sources", which is independent from the type of extension we talk about (documentation as text, entity/component block symbols, post-synthesis block diagrams, GUIs such as hwstudio...). For instance, currently SymbiFlow/sphinxcontrib-hdl-diagrams supports Verilog only, because it installs Yosys through WASM-pip (YoWASP) or Conda, and Yosys by default does not support System Verilog or VHDL. There is work in progress to support installing the tools through other mechanisms (SymbiFlow/sphinxcontrib-hdl-diagrams#72), but then we have the same issues we find when distributing tools in general (e.g. SymbiFlow/sphinxcontrib-hdl-diagrams#73).

To me, the dependency on GHDL (and Surelog) is perfectly acceptable from a pragmatic point of view, even though the distribution burden is increased. For my use cases, Yosys is not useful without ghdl-yosys-plugin. I need to be able to simulate pre-synthesis model and verify the behaviour, before I deal with post-synthesis results. For that same reason, I would not personally get a meaningful benefit from replacing pyGHDL.dom with pyVHDLParser, or from having GHDL with mcode on YoWASP. It would streamline some containers, but I would need "a regular GHDL" in my stack anyway (because I need to simulate, cosimulate and test the design I'm going to synthesise and document). Nonetheless, I understand that not all the target users of TerosHDL and VUnit are dependent on GHDL, since some of them might use Siemens', Aldec's, Xilinx's or Intel's tools only.

Plus, I see a lot of potential for this to be very widely used as HDL documentation is an area where I have not seen a solution that meets most people's needs.

HDL tooling is blocked since years (decades) because parsing and analysing is very hard, and it's impossible to build tooling without a library that understands the language. I believe we are all eager to build lots of tools, but all the projects hit the same wall sooner or later. You will find "code generation" to be common in multiple tooling projects, because that's the natural approach when understanding the language is not possible. Something apparently as simple as "getting the direction and type of a port" gets pretty nasty so easily. Actually, it's interesting to try understanding how pyGHDL.dom resolves that. Hence, we need to support GHDL and Surelog because those are the toolkits that will enable us to provide open source tooling for VHDL and SV on par with "modern" programming languages (Python, Golang, Rust, Scala...).

In parallel, we can and should work on Sphinx directives, because those can be used manually and then plugged to an autodoc. For instance, in GHDL's doc we declare everything manually, because it's an Ada program documented using Sphinx. The same approach might be used for VHDL projects, until the automation details explained below are solved.

This proved to be a huge headache for distributed projects and is not going to be easy to maintain.

This is something we also found when talking about the preliminary implementation of pyEDAA.ProjectModel, first between Patrick and me, and then with @ktbarrett. Terms "Project" and "Design" fall short to describe all the kinds of "things" that we want to work with.

A single "Project" might have multiple independent "Designs", each of them with their shared and unique sources and "Targets" for testing, synthesis and/or implementation. The meaning of "target" varies if we are talking e.g. about simulation entrypoints: OSVVM, CoCoTb and VUnit have very different entrypoints at the moment.

pyEDAA.ProjectModel uses pyVHDLModel and pySVModel internally, so that filesets can be declared once only, and then used either for simulation, for synthesis or for documentation purposes.

So far so good. There is work to be done yet, but it's doable on the Python side. However, as explained, currently the main usable open source VHDL 2008 parser and elaborator is GHDL, and it does not have a distinction between Project or Designs. For GHDL (libghdl) everything is a single "whatever": all the libraries belong to a single design of a single project. Therefore, there is some complexity to deal with when "running" the pyGHDL.dom to "auto-document" the whole content of a ProjectModel instance. That's something we might deal with on the Python side, or GHDL (libghdl) itself might be enhanced to support it.

Other related enhancement in this regard is the dependency/hierarchy resolution as done by VUnit, used by TerosHDL and wanted by FuseSoC (to allow incremental compilation/analysis). The pyEDAA.ProjectModel -> pyVHDLModel -> pyGHDL.dom -> libghdl chain can allow us to use GHDL for resolving the symbols and the hierarchy. That is on the table, but not actively developed at the moment. In order to boost it, we (as a community) need to buy time for Tristan or get to train other developers who can code Ada (or Rust) with an strong compiler and hardware design background (such a unicorm).

Without going into details, the distribution of components/modules is also to be taken into account. Some projects rely on git submodules for dependency control, others use specific tools such as FuseSoC or apio, the industry standard is IPXACT (see pyEDAA.IPXACT and edaa-org.github.io/pyEDAA.IPXACT: References). Intersphinx is such a powerful feature to build tightly cross-referenced but distributed documentation sites (e.g. ghdl, ghdl-cosim, edaa-org, hdl/constraints, openFPGALoader, etc.); however, we need to "resolve" dependency mechanisms in order to have it automated.

I think this is a natural direction because VHDL doesn't have any notion of a docstring or similar and I know parsing VHDL is not easy.

Maybe this is something to be proposed in the VHDL WG. The PAR is open for until 2024? 2025? Now is a good time to discuss whether we want to have some docstring format in the language. It is sensible to have that discussion in parallel with developing a set of Sphinx directives. The comments used by TerosHDL, which feel to be inspired by doxygen can be a good starting point for the discussion. If it does not suit the WG, it can be done in the Open Source VHDL Group (next to the pyVHDLModel).

Instead, I think it should be based on the placement of the comment similar to Python. In python, you can add a docstring by placing the block comment directly underneath the declaration

That's what I thought as well. However, a few days ago I had to document some dataclasses, and I found the following syntax to be required: https://github.com/hdl/containers/blob/d50f46d67aa4494dfdb4a01abc67c7e49bf9efea/utils/pyHDLC/__init__.py#L70-L81. Note that #: is used for single-line or multi-line documentation of fields.

Or even this maybe, adapting from google's style guide for python:

I prefer to have the documentation as close as possible to what you are documenting. E.g., I don't feel comfortable with param documentation in Python docstrings, even though I do like what you said about how classes are documented through a multiline docstrings right at the beginning.

Anyway, I believe this might be thinking too far now. IMHO, we should first deal with what's part of the language(s) already, and have directives to document either manually or automatically. Then, we might better see which is the information that needs to be provided through comments because the language cannot hold it otherwise.

One unknown here is how to provide documentation for a file.

I think that the Python approach is ok: have a docstrings for the file. It might be:

--| This is the docstring of a VHDL file.
--|
--| It's multiline!

--: IEEE library
library ieee;
--: Standard context from the IEEE library
context ieee.ieee_std_context;

entity and_gate is
  --| Documentation for and_gate.
  --|
  --| ``reST`` is supported and can link to other modules :data:`~lib.pkg.SOME_CONST`.
  port (
    --: The clock.
    clk : in std_logic; -- this is a regular comment
    --: Enable.
    --:
    --: This can also be multiline!
    -- But this part is not included in the documentation.
    -- It can be multiline as well.
    -- It's just a regular comment after all.
    en : in std_logic;
    z : out std_logic -- this output is undocumented
  );
end entity;

Note that --| is used insid/after whatever is being documented, while --: is used before.

there should be a drop-down

I believe this is more related to the theme than to sphinx itself. Most sphinx themes have very little javascript (they are most HTML + CSS). So, the challenge is finding some frontend developer who can adapt a theme. Other interesting features in this regard would be pan and zoom viewers of diagrams, because many large and complex representations can be generated from hardware designs.

However, the HDL extension would provide a flag to specify a configuration file. This file would be CSV (or TOML, YAML, or whatever makes sense) that would define all files in the project, their associated library, and anything else needed for building the documentation.

Rather than a declarative configuration file, I would focus on a Python object/class, i.e. on pyEDAA.ProjectModel. That can be read in the conf.py and made available to the extension (i.e. to be used in the directive). pyVHDLModel does already convert VHDL to a Python API. We don't need to reinvent that. We only need to transform one Python API with VHDL semantics to another Python API with Sphinx semantics. So:

[optional] dependency management/download tool
  -> VHDL filesets
    -> [optional] declarative configuration file
    -> ProjectModel (or Python API for the configuration file)
      -> Sphinx configuration/directives
        -> [implitcit] GHD/libghdl/Ada or rust_hdl or pyVHDLParser or tree-sitter-vhdl or... [VHDL semantics]
          -> Unless pyVHDLModel is used, a "parser to Python" conversion tool is required.
          -> Python [VHDL semantics]
            -> Python [Sphinx semantics]

I suggest we focus on the very last step. How to represent VHDL semantics in terms of Sphinx language domain features.

Note: pyVHDLModelUtils converts pyVHDLModel to restructuredtext, not to Sphinx. I.e., it's not an extension but "Python code executed at build time".

suzizecat commented 2 years ago

Some quick points that comes to mind on my end.

As mentioned by @JimLewis , I quite like the approach of Colibri which allows adding a special character at end of comment symbol in order to specify that it is a comment for documentation, like \\! in SystemVerilog or --! in VHDL.

As of today, we started using this feature of Colibri at work with nice results, but there are few points that could be improved. Off the top of my head, I would have those pro/cons about the current Colibri way of work:

Support of markdown and Wavedrom

The fact that Colibri works in plain markdown (easy to learn, understand, write) is great. The fact that it supports Wavedrom syntax natively is an absolute gem for documentation. If possible, a documentation system shall use it too. (We've done some insane diagrams with this)

In source documentation

The fact that the documentation is in source "only" is by itself a bit of an issue. While it is nice to have the documentation at hand when reading the sources, it can become bothersome if the full module documentation in the source while it would be nice to still have a detailed documentation in the final result.

Therefore a question, does Sphinx support having documentation for an arbitrary element from outside of the source file ? The issue would be if there is a documentation both in the source code and outside. In this situation, the "in code" documentation could be seen as the "brief" documentation unless otherwise specified.

I don't remember exactly, but I believe that either doxygen or sphinx allows this.

In the same vibe, I wouldn't like having extensive blocs of documentation between each ports. One-liner at the end of the port declaration would be my go-to.

Ports documentation

Please take into account that some IOs of a module might require extensive documentation. While some trivial ports (system clock, in example) might be good with a one liner, other might be related to timings, protocols and such. While I don't particularly like the fact that Teros put everything in a table (not very practical when you have a lot of documentation) the fact that I have a table with:

Is a great thing.

This table could act as a summary and link to a more extensive documentation, shall it exists.

qarlosalberto commented 2 years ago

I'm away, next week I will do a more detailed comment.

@suzizecat about the external documentation. Is it similar to?: https://terostechnology.github.io/terosHDLdoc/documenter/start.html#special-tags

suzizecat commented 2 years ago

@qarlosalberto Yay ! (Didn't even know you had those) However, in Doxygen, you can specify the element you wish to document while the comment is not near said comment as explained in https://stackoverflow.com/questions/18291017/code-documentation-outside-code-file

Therefore you can have the whole documentation separated from the code (including, as example, documentation for internal signals and such)

umarcor commented 2 years ago

Support of markdown and Wavedrom

There is a wavedrom extension for Sphinx already. Maybe docstrings can have an optional language keyword, similarly to code blocks in markdown or rst, so that users can specify the markup language used in the docstrings.

In source documentation

Markdown was designed for single-file documents. Therefore, there is no standard solution to include other files. However, Sphinx and restructuredtext do support include, toctree, ref, intersphinx, etc. With Sphinx, it is trivial to write the content in the sources, in sibling text/rst files, or mixing box approaches (which is the typical solution in Python projects).

For example, fields "Constraints" and "openFPGALoader" in https://hdl.github.io/constraints/Data/Boards/ are generated automatically from YAML sources located in multiple repositories, but shown together (per board). Another example is http://vunit.github.io/examples.html#vhdl, where all the content (the description of each example) is extracted from the docstrings in run.py files (note all the references to other sections in the docs).

Paebbels commented 2 years ago

The doxygen style of comments (--!) is not a nice solution. When you look at other languages, they are more clever. E.g. in .NET the double comment character (//) is comment and a tipple comment character (///) is structured documentation.

I don't suggest to use Markdown as a documentation language, because it's not powerful enough for most purposes. A language like ReStructured Text (ReST) is needed. Maybe it should be a "paragma" or global option how to interpret documentation comments as documentation syntax, because Markdown can't be distinguished from ReST content. Otherwise, the third character after the double comment character could denote the used documentation language.

It's not needed to use different comment indications for before and after an object. Other languages have managed to define rules so the comment location defined to what object to annotate the comment.


As @umarcor wrote, there are currently 2 approaches to get comments from VHDL files:

  1. pyGHDL.DOM - this needs modifications to GHDL, so comments are preserved and accessible from GHDL's AST. The idea is to preserve tokens and link them to AST nodes. If so, AST nodes can be traversed and then tokens before or after can be found.
  2. pyVHDLParser - this is currently on hold. As Unai wrote, an abstract pyVHDLModel was extracted, so multiple frameworks can use that model as a common interface.

Why is it on hold?
I'm currently working on EDA² and putting all my power into that project to get if flying in the next 3 months. Afterwards, I'll continue either with pyGHDL.DOM if it supports the needed additions or with pyVHDLParser. OTOH, if someone would be interested to help out in pyVHDLParser, I'm open to assist or switch priorities. The parsing technique used in pyVHDLParser is not easy to understand, but very powerful and as it looks in first experiments quite fast.

bradleyharden commented 2 years ago

First, I want to say that everyone here is much more active and invested in the VHDL ecosystem than I am. So take my opinions for whatever you think they are worth.

That being said, I see a lot of what I would consider very preliminary bikeshedding going on in this thread.

All of these things are trivial details that could be decided and changed later.

In my opinion, there are three major pieces that deserve the most attention:

  1. How to extract comments from VHDL code
  2. How to represent VHDL documentation in reStructured Text and Sphinx
  3. How to create links between documented items

There are several potential options for (1). As @umarcor mentioned, distribution of the selected tool could be a challenge. But overall, this area is the most developed of the three and the easiest to get working.

(2) is what I was trying to get at in Paebbels/sphinxcontrib-vhdldomain#4. I would consider this the biggest challenge. This should be your main focus. (3) can't be handled until (2) is handled anyway.

The [most complex challenge] is knowing/learning how to write proper Sphinx extensions, since the Sphinx codebase/architecture is not the most comfortable to work with.

I agree with this 100%. Back in 2019, I spent a few weeks working my way through the Sphinx codebase to understand how Python documentation is generated with an eye toward VHDL. I tried to capture some of that in the linked issue. I came away with a few conclusions:

I would absolutely love a Sphinx plugin for VHDL. But creating it will be no small project. Before discussing anything else, I would strongly suggest gaining a better understanding of Sphinx and reStructured Text. Otherwise, it will be very easy to accidentally design a reST syntax that ends up being difficult to parse and translate into the correct reST nodes.

In my opinion, everything else in this thread can and should take a back seat to the development of a plan for reST and Sphinx.

Edit: I just took a quick look at my old post. I want to highlight this one particular line, because I think it's important for this discussion:

Broadly, I think there is a tension between two different approaches. We could make the reStructuredText easy to write, and put a lot of work into parsing within Sphinx. Or we could make the reStructuredText more complex to ease the burden on Sphinx.

Paebbels commented 2 years ago

According to on of the Sphinx developers, the .NET domain is the most complex domain they did and could server as a starting point for VHDL. Other domains might be too simple.

But as @bradleyharden said, Sphinx might be the main challenge after somehow getting comments from VHDL files. Even with a regexp approach like in VUnit, we would get quite decent results for comment extraction, but have no VHDL domain in Sphinx.

bradleyharden commented 2 years ago

According to on of the Sphinx developers, the .NET domain is the most complex domain they did and could server as a starting point for VHDL.

I like this idea overall. My one counter would be that none of the software languages may be a good model for VHDL. I would suggest reading the .NET domain implementation while focusing on the broader Sphinx architecture.

When I went through the Python implementation, there were a lot of parts that wouldn't really work for VHDL entities. I did learn a bit about how Sphinx nodes are created and recursively processed, but I wish I had understood what to focus on before I started.

suzizecat commented 2 years ago

My two cents, if one wants to get a hang of Sphinx for HDL, wouldn't be more simple to start from the C++ domain declaration and make changes in order to support SV which is rather close to C++ ?

This could allow a simpler parser side while helping getting the internal data structure straight, as the internal structure would be closer-ish to VHDL than any other supported (application) language.

bradleyharden commented 2 years ago

That's an idea. I think there are probably lots of ways to go about it. I was thinking about it a little last night, and the questions that came to mind were:

bradleyharden commented 2 years ago

FYI, someone is advertising the CESNET repo on Reddit. Maybe it woulw make more sense to start from that project and expand?

umarcor commented 2 years ago

FYI, someone is advertising the CESNET repo on Reddit. Maybe it woulw make more sense to start from that project and expand?

See https://gitter.im/ghdl1/Lobby?at=61336cb85cfd665e5208f098

I just saw that @lepommosaure started following me, and he has an sphinx-vhdl (https://github.com/lepommosaure/sphinx-vhdl) fork of https://github.com/CESNET/sphinx-vhdl (authored by @jakubcabal and @Adrijaned). I'm pinging the three of them here, to see whether they are interested in talking about pyGHDL and https://github.com/Paebbels/sphinxcontrib-vhdldomain/.