Proposal: Migrate comment documentation to nodedef doc attributes for std libraries

kwokcb commented 3 months ago

Proposal

This is an off-shoot of a discussion about PugiXML custom changes with @jstone-lucasfilm and @ld-kerley The impetus being to remove the custom comment and line spacing custom changes to the library.

The main reason to have this is to allow for documentation of node definitions using XML comments. The proposal here is to move the comments into the nodedef doc meta-data tag which makes it publicly visible to integrations. (@ashwinbhat , for glTF PBR spec ratification this will at least start to have docs on nodes which can be enhanced with things like boundary conditions).

As an additional change, if nodegraph XML comments are to be preserved then the doc tag would be needed there.

Test

Here is some sample code which could work for nodedefs and the result. It's mostly a heuristic to try and guess what comments go with which nodedefs. The comment immeiate before a nodedef(s) is the doc string.

def transferCommentsToNodeDefs(libFile):

    readOptions = mx.XmlReadOptions()
    readOptions.readComments = True
    readOptions.readNewlines = True    
    readOptions.upgradeVersion = False

    outputDoc = mx.createDocument()
    mx.readFromXmlFile(outputDoc, libFile,  mx.FileSearchPath(), readOptions)        

    # Extract out comments and nodedefs
    currentComment = []
    children = outputDoc.getChildren()
    for child in children:
        if child.getCategory() == 'comment':
            docstring = child.getAttribute('doc')
            if len(docstring) > 0:
                docstring = re.sub(r'\s+', ' ', docstring.replace('\n', ' ').lstrip())
                docstring = docstring.strip()
            # Repace end .. with .
            if docstring.endswith('..'):
                docstring = docstring[:-1]
            currentComment.append(['comment', docstring, child.getName() ])
        elif child.getCategory() == 'nodedef':
            currentComment.append(['nodedef', child.getName()])

    strippedComments = []
    # Heuristic to find comments for nodedefs:
    # 1. Accumulate nodedefs until a comment is found
    # 2. Add an association between the comment and nodedefs
    # 3. Skip if a comment is found immediately before a comment
    # 4. Keep track of comments to remove
    hitComment = False
    nodedefList = []
    removeComments = []
    for i in range(len(currentComment)-1, -1, -1):
        if not hitComment and currentComment[i][0] == 'comment':
            if len(nodedefList) > 0:
                # Keep track of comments to remove.
                # Add [ nodedef, comment ] pair 
                removeComments.append(currentComment[i][2])
                for nodedef in nodedefList:
                    strippedComments.append([nodedef, currentComment[i][1]])
                nodedefList.clear();
            hitComment = True
        elif currentComment[i][0] == 'nodedef':
            nodedefList.append(currentComment[i][1])
            hitComment = False

    # Apply comments to nodedefs:
    # 1. Find nodedefs
    # 2. Add new comments to existing comments
    print('nodedefs with new comments')
    for i in range(len(strippedComments)):
        print(strippedComments[i])

        nodedef = outputDoc.getChild(strippedComments[i][0])
        if nodedef is None:
            print('Cannot find nodedef:', strippedComments[i][0])
            continue
        currentDoc = nodedef.getAttribute('doc')
        newDoc = strippedComments[i][1]
        if len(currentDoc) > 0:
            newDoc = newDoc + " " + currentDoc
        nodedef.setAttribute('doc',  newDoc)

    # Remove comments
    for i in range(len(removeComments)):
        outputDoc.removeChild(removeComments[i])

    return outputDoc

Snippets from result on stdlib_defs.mtlx

  <nodedef name="ND_sign_float" node="sign" nodegroup="math" doc="Node: <sign>. Sign of each input channel: -1, 0 or +1">
    <input name="in" type="float" value="0.0" />
    <output name="out" type="float" defaultinput="in" />
  </nodedef>

  <nodedef name="ND_clamp_float" node="clamp" nodegroup="math" doc="Node: <clamp>. Clamp incoming value to a specified range of values.">
    <input name="in" type="float" value="0.0" />
    <input name="low" type="float" value="0.0" />
    <input name="high" type="float" value="1.0" />
    <output name="out" type="float" defaultinput="in" />
  </nodedef>

   <nodedef name="ND_range_color3" node="range" nodegroup="adjustment" doc="Node: <range> Supplemental Node. Remap a value from one range of float/color/vector values to another, optionally applying a gamma correction in the middle, and optionally clamping output values.">
    <input name="in" type="color3" value="0.0, 0.0, 0.0" />
    <input name="inlow" type="color3" value="0.0, 0.0, 0.0" />
    <input name="inhigh" type="color3" value="1.0, 1.0, 1.0" />
    <input name="gamma" type="color3" value="1.0, 1.0, 1.0" />
    <input name="outlow" type="color3" value="0.0, 0.0, 0.0" />
    <input name="outhigh" type="color3" value="1.0, 1.0, 1.0" />
    <input name="doclamp" type="boolean" value="false" />
    <output name="out" type="color3" defaultinput="in" />
  </nodedef>

ld-kerley commented 3 months ago

I think it's a great idea to formalize the documentation, and move it away from comments in the code, and in to the actual data. Another win in the fight to make MaterialX a data driven library :). I think it perhaps also opens future doors to auto generated node library documentation in an easier way.

Another possible syntax to consider (not weighing any heavy opinion in either direction yet) would be to add a <doc> child tag to the <nodedef>

ie.

  <nodedef name="ND_sign_float" node="sign" nodegroup="math">
    <doc>Node: <sign>. Sign of each input channel: -1, 0 or +1</doc>
    <input name="in" type="float" value="0.0" />
    <output name="out" type="float" defaultinput="in" />
  </nodedef>

  <nodedef name="ND_clamp_float" node="clamp" nodegroup="math">
    <doc>Node: <clamp>. Clamp incoming value to a specified range of values.</doc>
    <input name="in" type="float" value="0.0" />
    <input name="low" type="float" value="0.0" />
    <input name="high" type="float" value="1.0" />
    <output name="out" type="float" defaultinput="in" />
  </nodedef>

   <nodedef name="ND_range_color3" node="range" nodegroup="adjustment" >
     <doc>Node: <range> Supplemental Node. Remap a value from one range of float/color/vector values to another, optionally applying a gamma correction in the middle, and optionally clamping output values.</doc>
    <input name="in" type="color3" value="0.0, 0.0, 0.0" />
    <input name="inlow" type="color3" value="0.0, 0.0, 0.0" />
    <input name="inhigh" type="color3" value="1.0, 1.0, 1.0" />
    <input name="gamma" type="color3" value="1.0, 1.0, 1.0" />
    <input name="outlow" type="color3" value="0.0, 0.0, 0.0" />
    <input name="outhigh" type="color3" value="1.0, 1.0, 1.0" />
    <input name="doclamp" type="boolean" value="false" />
    <output name="out" type="color3" defaultinput="in" />
  </nodedef>

Which might perhaps allow for multiple line documentation strings? I'm not sure if new-line characters are allowed inside of XML attributes....

I also don't know if it might allow for a slightly more optimized XML reader - if we knew we didn't need the documentation (ie. when we're doing shader gen only) we could skip the <doc> elements.

Or perhaps if we don't want XML text elements in the document, we could add

<doc helptext="Node: <range> Supplemental Node. Remap a value from one range of float/color/vector values to another, optionally applying a gamma correction in the middle, and optionally clamping output values."/>

still giving us the advantage of filtering out the read of <doc> if we want to avoid that, and also leaving the possibility of adding more attributes to the <doc> tag, like perhaps thumbnail="<path_to_image>" or others.

kwokcb commented 3 months ago

Currently only the doc attribute on nodes or inputs are used though a <doc> element does seem inline with earlier discussion of separating docs from structure. If there are separate <doc> then indeed we could add in more info. This is what I did when transforming the defs into JSON for my web editor and included things like USD, OpenPBR thumbnails and other stuff like ui color coding. At this point If we go this far, I'd just separate out docs from the nodedef and use a "reference" to docs instead embedding it. I had something like this going since I need to add in other "asset" meta-data like authoring info etc.

AFAIK it should be assumed there are no special characters such as newlines, tabs etc. The W3C XML/HTML spec says you should not put any special characters including < and > into the document as validators will fail / reject it as invalid XML. This is currently the case when you try it out with these.

I'd rather formalize formatting than keep newlines, tabs, etc, For instance if a mimetype was supported then you could (if allowed syntax in XML) add "standardized" formatting -- say using Markdown. You could even sneak in diagrams, formulas etc :).

Anyways, just "pie in the sky" stuff.

As a first step if moving comments into doc attributes is acceptable I can put up a PR for stdlib.

AcademySoftwareFoundation / MaterialX

Proposal: Migrate comment documentation to nodedef doc attributes for std libraries #1950

Proposal

Test