Retrieve doxygen C++ comments for all classes/structs/methods and save key/value pairs in text (rest or md)

faresVS commented 3 years ago

I have a C++ library. Some of it is exported as python bindings (using pybind11). I would like the comments of the C++ classes to be used as docstrings for the python bindings. This seems like a common and basic use-case, but I couldn't find any clear and simple solution to do that automatically. If you know of something which could do this automatically, please feel free to share a link.

The workflow i'm currently using is the following: 1) generate doxygen xml files for the C++ code 2) use a python script to parse each of those xml files and retrieve docstrings corresponding to each C++ class/struct/method. This means: a) use xml.etree.ElementTree to parse each xml file b) convert each interesting node of the xml tree into a text string (currently I am using markdown format, but rest should be good too), using pypdandoc.convert_text(...) c) store the key/value pairs into a python dic, and save this into a json file 3) in the c++ binding code, load a map<string, string> from this json file and use that to automatically fill the documentation for the python bindings classes and methods.

This setup does the job. But it is a bit hairy, especially the part (2-b), because conversion from a complex doxygen xml structure to markdown or rest is not as straightforward as I was hoping for initially. I though I could get away with a simple xml.etree.ElementTree.tostring(node, method="text"), or a pypandoc.convert_text(ET.tostring(node), to="md", format="docbook"). But in reality doing this tends to break the formatting by adding (or removing) extra '\n' or '\' characters here and there, and a little bit of more laborious manual recursive parsing of the interesting xml nodes is required to get a decent text formating out of it. This laborious manual parsing and filtering of the interesting xml nodes (2-b) is the part I would like to get rid of. Since Breathe is meant to eat doxygen xml and spit out some text in rest format, maybe it could provide a perfect fit for what I need here. Could you tell me which parts of the breathe code base or breathe-apidoc I should look into ? Thanks.

vermeeren commented 3 years ago

@faresVS Have you checked this page https://breathe.readthedocs.io/en/latest/codeguide.html , it contains details about how Breathe work and also talks about the XML parsing. This might be of use to you.

faresVS commented 3 years ago

Hello @vermeeren

Thank you for the link to the doc. Yes I have read that, and yes it seems Breathe can do what I need. But it is still not very clear how to do it. Maybe we can work out a very simple example.

Lets say I have a very simple project composed of 3 files:

dummy_a.h

#pragma once

/// @brief This class does the A
/// It is very good at doing what it does
class DummyA {
public:
    /// @brief This is the constructor
    /// @param x this value is not used
    DummyA(int x) { }

    /// @brief This does something
    /// This super useful function does something
    /// @param y 
    void DoSomehting(int y) {}

};

dummy_b.h

#pragma once

#include <iostream>
#include "dummy_a.h"

/// @brief This class does the functionality B
/// It completely different from 
class DummyB {

public:
    /// @brief This is the constructor
    DummyB() { }

    /// @brief This does something
    ///
    /// This function takes a DummyA and does something
    ///
    /// @param da the thing to be processed
    /// @param x value along the first axis
    /// @param y value along the second axis
    /// @param z value alont the third axis
    void DoSomethingWithA(const DummyA & da, float x, float y, float z) {
        std::cout << "A::DoSomethingWithA()" << std::endl;
    }

};

main.cpp

#include <iostream>
#include "dummy_a.h"
#include "dummy_b.h"

int main() {
    std::cout << "BEGIN MAIN" << std::endl;

    DummyA a(42);
    DummyB b;

    b.DoSomethingWithA(a, 1.1f, 2.2f, 3.3f);

    std::cout << "END MAIN" << std::endl;
    return 0;
}

I generate a Doxygen config file by calling doxygen -g I edit this file and enable XML output: GENERATE_XML = YES

Then I run doxygen and (among other things) I get the following directory :

xml
├── classDummyA.xml
├── classDummyB.xml
├── combine.xslt
├── compound.xsd
├── dummy__a_8h.xml
├── dummy__b_8h.xml
├── index.xml
├── index.xsd
└── main_8cpp.xml

If I understand correctly the documentation you sent, I should do some parsing, then some filtering and then some rendering. Based on the name of python files mentioned in the doc (compound.py, compoundsuper.py, index.py and indexsuper.py), I guess the target files to be parsed are index.xsd and compound.xsd (is that correct ?)

The doc mentions that The entry points to the parsing code is the parse functions at the bottom of the breathe.parser.doxygen.compound and breathe.parser.doxygen.index I guess this is outdated and this should be replaced with breathe.parser.compound and breathe.parser.index instead

import breathe
pi = breathe.parser.index.parse("./xml/index.xsd")

which returns a breathe.parser.index.DoxygenTypeSub object

I can also do:

import breathe
pc = breathe.parser.compound.parse("./xml/compound.xsd")

which returns a breathe.parser.compound.DoxygenTypeSub object

Now, what can I do with that ?

In reality, the two xmls which contain the most relevant information for me are classDummyA.xml and classDummyB.xml I can parse them with breathe.parser.index.parse(...) and bretathe.parser.compound.parse(...), but then ? How is think linked with the breathe.finder.compound and and breathe.renderer ?

If I step back and try to go back to my initial question: Since I can do the parsing on my own of all xml files using ElementTree, I can determine which method of which class I am currently processing. For example, while walking down the xml tree of classDummyB.xml, I know when I reach the node of method DummyB::DoSomethingWithA(...). I know that this method is described under the xml tags : <briefdescription>...</briefdescription> and <detaileddescription>...</detaileddescription>

Is it possible to bypass the bretathe.parser and breathe.finder parts, and jump directly to the breathe.renderer ? If I could feed a renderer (maybe a sphinxrenderer : https://github.com/michaeljones/breathe/blob/master/breathe/renderer/sphinxrenderer.py ? ) with the proper string corresponding to the xml subtree of interest, could this renderer return me the corresponding formated text ?

For example, considering the method DummyB::DoSomethingWithA(...), I would feed it with the two following xml subtrees:

<briefdescription>
    <para>This does something. </para>
</briefdescription>

and

<detaileddescription>
<para>This function takes a <ref refid="classDummyA" kindref="compound">DummyA</ref> and does something</para><para><parameterlist kind="param"><parameteritem>
<parameternamelist>
<parametername>da</parametername>
</parameternamelist>
<parameterdescription>
<para>the thing to be processed </para></parameterdescription>
</parameteritem>
<parameteritem>
<parameternamelist>
<parametername>x</parametername>
</parameternamelist>
<parameterdescription>
<para>value along the first axis </para></parameterdescription>
</parameteritem>
<parameteritem>
<parameternamelist>
<parametername>y</parametername>
</parameternamelist>
<parameterdescription>
<para>value along the second axis </para></parameterdescription>
</parameteritem>
<parameteritem>
<parameternamelist>
<parametername>z</parametername>
</parameternamelist>
<parameterdescription>
<para>value alont the third axis </para></parameterdescription>
</parameteritem>
</parameterlist>
</para>        </detaileddescription>

And it would return me a correctly formatted text version of the initial c++ comment, which was:

    /// @brief This does something
    ///
    /// This function takes a DummyA and does something
    ///
    /// @param da the thing to be processed
    /// @param x value along the first axis
    /// @param y value along the second axis
    /// @param z value alont the third axis

Is this possible ?

jakobandersen commented 3 years ago

I have a C++ library. Some of it is exported as python bindings (using pybind11). I would like the comments of the C++ classes to be used as docstrings for the python bindings. This seems like a common and basic use-case, but I couldn't find any clear and simple solution to do that automatically. If you know of something which could do this automatically, please feel free to share a link.

I have almost the exact same situation, but unfortunately no good solution that doesn't rely on manual work and/or haxy scripts.

Since Breathe is meant to eat doxygen xml and spit out some text in rest format

This is actually not the case. The sphinxrenderer in Breathe directly produces docutils nodes, i.e., the internal representation that Sphinx works on and manipulates, before it is converted into whatever output format.

2bndy5 commented 2 years ago

unfortunately no good solution that doesn't rely on manual work

Agreed. I had to copy the relevant docs into my pybind11 wrapper definitions' docstrings. Then, I just used autodoc to extract the docs from the installed python binding. This also has the benefit of docs showing up in the REPL's help(wrapper_obj.member) calls.

breathe-doc / breathe

Retrieve doxygen C++ comments for all classes/structs/methods and save key/value pairs in text (rest or md) #601