PixarAnimationStudios / OpenUSD-proposals

Share and collaborate on proposals for the advancement of USD
106 stars 27 forks source link

Bi-Directional Transcoding of Invalid Identifiers #37

Closed miguelh-nvidia closed 7 months ago

miguelh-nvidia commented 8 months ago

Description of Proposal

TfMakeValidIdentifier, was used in OpenUSD to convert any identifier into a valid identifier. However, it creates a non-bidirectional relationship, for example, something like カーテンウォール would be transformed into ________________.

The objective of this proposal is to provide an alternative to TfMakeValidIdentifier that can take any identifier (potentially with invalid characters) and transform it into a OpenUSD valid identifier.

Contributing

marktucker commented 7 months ago

SideFX has been using a customized version of punycode to encode/decode between USD attribute names and Houdini parameter names for a very long time, and it has been quite a successful strategy. I also proposed something like this as a possible alternative to adding true UTF-8 support to USD. So I'm very much onboard with this overall proposal, and can vouch for its usefulness.

Just a few specific comments about the suggested APIs... In many cases there seems to be a typo where you are using "Boostring" instead of "Bootstring". But I would actually suggest eliminate the word "Bootstring" (or "Boostring") from the APIs completely. If this is going to be the Sdf standard for encoding/decoding identifiers, I shouldn't have to know or care that it's using an algorithm called "Bootstring".

I'm also not sure I see the value in returning a std::optional. Wouldn't it be just as easy to return std::string() to indicate an error case? When I saw the API returning std::optional my first thought (and fear, because I think it would be a bad idea) was that the "optional"-ness was going to be used to indicate whether or not the input string had to be modified. Returning and empty std::string would also make these APIs more consistent with the signatures and behavior of TfMakeValidIdentifier.

nvmkuruc commented 7 months ago

Thanks for the thoughts, Mark! It's helpful to know you've had success using Bootstring encoding!

But I would actually suggest eliminate the word "Bootstring" from the APIs completely. If this is going to be the Sdf standard for encoding/decoding identifiers, I shouldn't have to know or care that it's using an algorithm called "Bootstring".

I've been an advocate for giving the encoding algorithm a distinguishing name. My thinking is that there are still probably needs for "make valid identifier functions" that aren't bidirectional. For example, replacing symbols and whitespace with _ may be preferable for some users and contexts, and they're willing to give up bidirectionality. My other thought is that as people start to see tn__ appear in paths and logs, it might be helpful for users to have some shorthand for describing them-- "That's a bootstring encoded identifier."

I don't think we're attached to SdfBootstring{Encode,Decode} though if there's better suggestions.

nvmkuruc commented 7 months ago

Just wanted to note the potential need for something like transcoding in Hydra when mapping primvars to GLSL.

meshula commented 7 months ago

Personally, I am leaning to not requiring the name of the algorithm, maybe you got my brain off on the wrong footing with the Freudian slip-ish "boostring". We'll never finish de-boosting!!

  1. it isn't exactly a bootsring so let's not call it that in the first place. It's a custom thing. The documentation you write should be sufficient for someone to understand, if they care, that the algorithm is largely similar to bootstring. But that is an implementation detail, not something of foundational importance.
  2. I don't agree that we should preserve an option for non-reversibility. It's complexity with only hypothetical use cases, and the hypothetical use case comes down to a hypothetical user wishing for "perfect" identifiers at the expense of reversibility.
meshula commented 7 months ago

PS if we are down to bikeshedding the name, shall we merge this PR in Draft state?

meshula commented 7 months ago

@miguelh-nvidia Are you able to sign the commits and force push? If no, I'll merge as is. Otherwise I'd prefer to attempt to satisfy the automation bot.

miguelh-nvidia commented 7 months ago

@meshula ready!, thanks.