Add operator name canonicalization

GeorgeR227 commented 4 months ago

This is meant to address #34, to allow for better support of using Unicode and Ascii operator names. The idea here is to allow the user to choose from a range of supported operator names but have the underlying codebase only work on the canon name.

An example would be to have type inference rules carry a canon name, instead of an array of supported names, and have inference rules simply check that their name matches the converted user name.

These canon names should be carefully chosen to be easy to work with and parse. Some rules are included but are liable to change.

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 81.81818% with 4 lines in your changes missing coverage. Please review.

Project coverage is 85.96%. Comparing base (12edc3a) to head (df5d2c5). Report is 3 commits behind head on main.

Files	Patch %	Lines
src/deca/deca_op_names.jl	75.00%	4 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #54 +/- ## ========================================== + Coverage 84.68% 85.96% +1.28% ========================================== Files 12 14 +2 Lines 764 905 +141 ========================================== + Hits 647 778 +131 - Misses 117 127 +10 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

lukem12345 commented 4 months ago

This PR just uses the global names Dict for the type inference and function overloading, but it should be used throughout this codebase and Decapodes.jl wherever explicit symbols appear. Can you spot check the rest of this codebase for such updates? Where it is too tedious (e.g. unicode!), we can make an issue to refactor.

GeorgeR227 commented 4 months ago

I've skimmed through the rest of the code and it seems like really the only other function that should use canon names, in DiagrammaticEquations, would be the find_chains code. So in this case, we can just check that the canon name of an operator matches the canon name of an operator in a black/whitelist.

Everything else seems to just be editing names or copying over raw names and so don't need this.

lukem12345 commented 4 months ago

https://github.com/AlgebraicJulia/Decapodes.jl/pull/142 is a spiritual predecessor of this feature. The main distinctions are that the old PR "canonicalized" on Unicode, and this PR ASCII, and the old PR added a subroutine that edits the ACSet to use the canonicalized names.

lukem12345 commented 4 months ago

We note that incident(decapode, :L_1, :op1) needs special treatment if it is to be inter-operable with canonical representations. Of course, features such as incident(decapode, :L, :op1) (the intention being to get all Lie derivatives, typed or untyped, Unicode or ASCII) require more engineering anyway.

lukem12345 commented 4 months ago

As discussed off-line, we should:

[ ] Have a way of getting the Unicode of representation of an operator. (Currently partially handled by the Dict from Decapodes.jl PR 142)
[x] '' ASCII ''. Currently handled by this PR.
[ ] We want to get the set of all aliases (for e.g. using incident in a more pain-free way). This information is stored implicitly now.

AlgebraicJulia / DiagrammaticEquations.jl

Add operator name canonicalization #54

Codecov Report