Closed EricCousineau-TRI closed 4 years ago
Per f2f with @jamiesnape, a crude first approach would be to figure out what docstrings are not being used.
Can do this with the existing documentation_pybind.h
and the existing source files.
Script for producing symbols from pydrake.all
:
Here, Output.
Output of cat documentation_pybind.h | grep -i 'Symbol: '
Per Slack convo: Current code is just an initial rough estimate, comparing a certain subset of symbols in C++ to a subset of Symbols in C++.
Next step is to do symbol-by-symbol correspondence.
To expand on (what I think) Jamie said with an example, the correspondence script may not even need to know Python symbols; instead, it will just look at a source line in *_py.cc
files, e.g.:
https://github.com/RobotLocomotion/drake/blob/c6f0525c9c9745a6e61461a176cd6a26a9582a74/bindings/pydrake/math_py.cc#L52
and now that in context, cls_doc.ctor.doc_0args
resolves to pydrake_doc.drake.math.RigidTransform.ctor.doc_0args
.
This will add one usage counter to the corresponding line in documentation_pybind
:
https://gist.github.com/EricCousineau-TRI/e03783a46ab214d4c99f9ad90acf5a81#file-documentation_pybind_snippet-h-L6
Perhaps the most robust way to do this (even for prototyping) is to hack mkdoc.py
to emit the symbol names to a YAML file or something, which then the secondary script can load and use for counting symbol occurrences.
Additionally, for being able to resolve docstring alias assignments, e.g. constexpr auto& cls_doc
, and their usage, e.g. cls_doc.doc
, I would recommend just scanning line-by-line, and whenever you see an alias assignment, just update a table with it; then, when you come across a docstring, expand it using that table.
EDIT: One caveat is that it won't completely cover instantiations of generic templates, e.g. TypeIndex<Class>
or Identifier<Class>
.
I could not get libclang
to parse the doc
name inside the .def
function.
I tried castXML
but it could not do it as well. Brad King said:
castxml does not generate any information about function bodies, so it is unlikely to work for your purpose.
I tried cppast
to no avail. It also cannot parse function bodies. (cppast
claims to give more information about the AST than libclang
). Under Missing features
:
Support for statements: currently function bodies aren't parsed at all;
I'm now working on modifying Pybind11 to get the mapping using the records it keeps.
I could not get libclang to parse the doc name inside the .def function.
Can I ask if you've tried super simple text parsing? Per the suggestion above:
[...] the correspondence script may not even need to know Python symbols [...] I would recommend just scanning line-by-line, and whenever you see an alias assignment [...]
This shouldn't require libclang
or anything to scan the *_py.cc
files.
Can you try this first, before delving further into pybind11
internals?
Also, might it be possible to have a first rough-cut of this (using basic text parsing, not C++ AST / libclang / castXML / cppast / etc.) by this Friday (2019/07/19)?
Let's say you point it at drake/math
, and we see how much of that is covered by math_py.cc
looking at each docstring? (This can be on a prototype branch, nothing has to be a PR / merged to master)
Also, might it be possible to have a first rough-cut of this (using basic text parsing, not C++ AST / libclang / castXML / cppast / etc.) by this Friday (2019/07/19)?
Sure.
Thanks! And I realized that my above statement may have had an imprecision there:
I just meant to use basic text parsing for the resolution of the docstring symbol, not for producing the symbols in the first place (already done by mkdoc
).
Next steps:
pandas
, then look at stuff (percentages).We made another attempt using the libclang tokenizer, here. (It has a csv
printed using pandas
)
Before figuring out how to polish towards master
, we should define an initial blacklist for coverage that we don't necessarily care about (e.g. //attic/...
, etc.), internal API (e.g. MultibodyTree
).
Per f2f, I will review symbols, and give suggestions on what patterns to start filtering out.
Here's the initial broad sweeping suggestions for symbols to ignore:
/attic/
/automotive/
dev
namespaceinternal
namespace (should go away once we fully remove deprecated MBT bindings)@exclude_from_pydrake_mkdoc
(which shouldn't be
incorporated in this analysis, but just in case...)Can you incorporate these exclusions in your analysis?
attic
symbols, we had to compare the header files in the source tree under the directory attic/
with that in the install tree, because mkdoc.py
seems to be using files in the install tree. We do so by matching the names of the files and then matching the contents (in case multiple files in the install tree have the same name as the source tree). attic
is easy, we just don't parse bindings/pydrake/attic
.The attic/foo/bar/baz.h
should always be foo/bar/baz.h
in the install tree. No need to compare file contents, if you use the package + basename instead of just basename.
Per f2f, seems like current coverage is at 18% with this measurement step.
However, this only is for filtering attic
. Does not filter internal
, dev
, automotive
and maliput
.
This filters internal
, dev
, automotive
and maliput
.
%age = 23.75
I think the next best step is to try and more reasonable carvings of the current metrics, and a little more refining on the filters.
Can you try setting it up for doing queries based on types (if possible), like:
Additional things that would be nice to report:
Additionally, can you add the following directories to your ignore list?
common/proto
common/yaml
examples/ # We do have some bindings, but I'm not too considered with these bindings at the moment.
After this, we should then define file markers / a file manifest that states whether a given file is geared towards pure C++ usage (e.g. nice_type_name.h
, hash.h
, eigen_types.h
). However, that could possibly be filed as a follow-up issue.
EDIT: Er, but to answer when to start polishing - let's try to update the filter and metric carving first.
Additionally, can you add the following directories to your ignore list?
Here %age = 25.60
Can I ask if you have an update for the additional coverage metrics?
I'm now using xpath to get the answers to the coverage questions. https://github.com/m-chaturvedi/drake/compare/master...m-chaturvedi:meeting_branch_1
Will be adding type-wise coverage as well. Edit: Added type-wise coverage.
Will you be able to have a rough-cut of ~type-wise and~ directory-wise coverage by this upcoming Tuesdsay EOD?
EDIT: Just saw edit for type-wise.
Sure.
Updated link after granting permissions.
Looks good! Two minor requests:
drake/
, drake/common/
, drake/systems/primitives/
), can you also show overall directory coverage?Thanks! Looks much better.
Some minor follow-up nits:
baobob
(Disk Usage Analyzer) on Ubuntu (relates #9650)We can file these as follow-up issues or TODOs or whatever, just as long as the code that lands is easy to achieve these things in the future (which hopefully it is?).
I think overall, this is almost ready to start polishing and land on master. Before we do that: @sammy-tri Can I ask if there are any other coverage aspects you think we might want to cover?
\cc @jwnimmer-tri @RussTedrake
Updated: https://docs.google.com/spreadsheets/d/1wlhPxnYbJ1YSUIXZUaasagBf8Nbt8QbiTnPht0J6K8k/edit?usp=sharing
I used the following dictionary for making those classes: https://gist.github.com/m-chaturvedi/2016db62767119af431b04a9adc45667#file-parse_pybind_xml-py-L32-L41
Minor readability request: When you add links in, rather than inlining them into entire statements, can you present them after a colon? e.g.
Updated: <link>
Using the following dictionary for those classes: <link>
Will check this out later today, thanks!
Sure, thanks! I used the following dictionary for making those classes
was even factually incorrect, the link didn't follow the text :).
Just took a peak, looks generally good. A few more minor notes:
You may want to update your sorting based on file vs. directory; at present, you're just sorting lexically, which means you can have an ordering like:
a/b/c.txt
a/b/d/e.txt <-- Directory in the middle.
a/b/f.txt
Probably using a simple sorting heuristic would fix that, I think?
Other than that, perhaps we can get a first version set up to PR against master?
Ah, but first, could you also do the breakdown on symbol coverage, esp. with nesting (e.g. in namespaces and in classes, etc.) - not just files?
You may want to update your sorting based on file vs. directory:
I did that because then there would be too many DirCoverage
values (1st column), which wouldn't make too much sense.
Ah, but first, could you also do the breakdown on symbol coverage, esp. with nesting (e.g. in namespaces and in classes, etc.) - not just files?
Sorry had forgotten to update: https://docs.google.com/spreadsheets/d/1MQuzuluffw7CJ5tivJOmOKfE88bI9clTkZX7e9gZ8H0/edit?usp=sharing
Other than that, perhaps we can get a first version set up to PR against master?
OK.
I did that because then there would be too many DirCoverage values (1st column), which wouldn't make too much sense.
Per f2f, resolution is that files should be clustered by their immediate directory (not that directories are first in list, as you'd see in a file explorer).
Target Date for initial PR is Friday, 2019-09-13.
Hey @EricCousineau-TRI ! I want to know what would be an expected workflow for getting the plots / csv files.
That is, would we be generating the coverage csv's when running the generate_pybind_documentation_header
command? (Currently, I generate an XML file needed for the remaining analysis using it.)
Or, perhaps I should add another Bazel command that needs to be run for generating the csv files and then yet another one for opening a Jupiter notebook that shows them? (This new command would need to access the .cc
s inside the bindings/pydrake
directories.)
Sorry for the late response here - the next step is to get the script that output a CSV onto master on your WIP PR.
After that has landed on master, we can work on Jupyter notebook output / whatever else we use for presenting the results.
Mmanu has split up his PRs into simpler pieces, still reviewing but should land soon.
Since #12040 has merged, going to close this. If we need a public report (e.g. via Jenkins / CDash), then we can file a separate issue.
Per prior discussion: https://github.com/RobotLocomotion/drake/issues/7889#issuecomment-499510945
(I mis-interpreted this from before) Per @ggould-tri's suggestion (I think), we should have some way to show what bindings are missing from the public Drake API.
Can reassign once this is closer to be worked on.
\cc @jwnimmer-tri