Open ahelwer opened 1 month ago
Possible bugs include if your translation code looks at actual string representation (for example,
"\\E"
instead ofTLAplusParserConstants.EXISTS
) which was the case in SANY itself.
Oh, we have a bunch of those :)
instead of
TLAplusParserConstants.EXISTS
Do you know how stable this table of constants going to be? It's autogenerated. I assume that the names will not change most likely?
@ahelwer do I understand it right that the unicode operators have their own names such as op_infix_neq_uc
? I wonder where to get the actual mapping from without guessing.
Do you know how stable this table of constants going to be? It's autogenerated. I assume that the names will not change most likely?
It has been stable for about 10 years as far as I know, no plans to change it in the future.
do I understand it right that the unicode operators have their own names such as op_infix_neq_uc
No, they do not. They are all wrapped up into the same TLAplusParserConstants
token IDs.
If you want a quick shortcut you could also just add all the different text-based unicode variants, defined here: https://github.com/tlaplus/tlaplus-standard/blob/main/unicode/tla-unicode.csv
Although this might make it more work if any of these mappings are ever changed.
I find TLAplusParserConstants
quite confusing. It contains an array called tokenImage
:
I wrote a simple Python script to do lookups in this table (see below). Surprisingly, the table contains codes both for /=
and #
. I guess, one of them is getting normalized somewhere in SANY?
The Python script find-token.py
:
#!/usr/bin/env python
import sys
pattern = sys.argv[1]
all_tokens = ["<EOF>", "<BEGIN_MODULE>", "\"--->\"", "<_BM1>", "<token of kind 4>", "<CASE0>", "<CASE1>", "<CASE1b>", "<CASE1c>", "<CASE2>", "<CASE2b>", "<CASE2c>", "<CASE3>", "<CASE6>", "<CASE6b>", "<CASE6c>", "<CASEN>", "<LETTER>", "<DIGIT>", "<NUMBER>", "<_BM2>", "<token of kind 21>", "\" \"", "\"\\t\"", "\"\\n\"", "\"\\r\"", "<token of kind 26>", "\"\\\\*\"", "<token of kind 28>", "\"*)\"", "\"*)\"", "<token of kind 31>", "<token of kind 32>", "<token of kind 33>", "<_BM0>", "<SEPARATOR>", "<END_MODULE>", "<ACTION>", "\"ASSUME\"", "\"[]ASSUME\"", "<ASSUMPTION>", "\"CASE\"", "\"CHOOSE\"", "<CONSTANT>", "\"ELSE\"", "\"EXCEPT\"", "<EXISTS>", "\"EXTENDS\"", "<FORALL>", "\"IF\"", "\"INSTANCE\"", "\"LET\"", "\"IN\"", "\"LOCAL\"", "\"MODULE\"", "\"NEW\"", "\"OTHER\"", "<PROPOSITION>", "\"SF_\"", "\"\\\\EE\"", "\"\\\\AA\"", "\"THEN\"", "\"BY\"", "\"ONLY\"", "\"DEFINE\"", "<DF>", "\"THEOREM\"", "\"USE\"", "\"HIDE\"", "\"HAVE\"", "\"OBVIOUS\"", "\"OMITTED\"", "\"LAMBDA\"", "\"TAKE\"", "\"PROOF\"", "\"PROVE\"", "\"[]PROVE\"", "\"QED\"", "\"RECURSIVE\"", "\"STATE\"", "<TEMPORAL>", "\"PICK\"", "\"WITNESS\"", "\"SUFFICES\"", "<VARIABLE>", "\"WF_\"", "\"WITH\"", "\",\"", "\":\"", "\"::\"", "\".\"", "\"_\"", "\"==\"", "\"(\"", "\")\"", "\"-|-\"", "\"[\"", "\"]_\"", "\"]\"", "\"{|\"", "\"|}\"", "\"{\"", "\"}\"", "\"<<\"", "\">>_\"", "\">>\"", "\"!\"", "\"->\"", "\"<-\"", "\"|->\"", "<NUMBER_LITERAL>", "<STRING_LITERAL>", "<BAND>", "<BOR>", "\"\\'\"", "\"^+\"", "\"^*\"", "\"^#\"", "\"-.\"", "\"\\\\lnot\"", "\"\\\\neg\"", "\"~\"", "\"[]\"", "\"<>\"", "\"ENABLED\"", "\"UNCHANGED\"", "\"SUBSET\"", "\"UNION\"", "\"DOMAIN\"", "\"//\"", "\"/\\\\\"", "\"/=\"", "\"/\"", "\"\\\\/\"", "\"\\\\approx\"", "\"\\\\asymp\"", "\"\\\\bigcirc\"", "\"\\\\bullet\"", "\"\\\\cap\"", "\"\\\\cdot\"", "\"\\\\circ\"", "\"\\\\cong\"", "\"\\\\cup\"", "\"\\\\div\"", "\"\\\\doteq\"", "\"\\\\equiv\"", "\"\\\\geq\"", "\"\\\\gg\"", "\"\\\\in\"", "\"\\\\intersect\"", "\"\\\\union\"", "\"\\\\land\"", "\"\\\\leq\"", "\"\\\\ll\"", "\"\\\\lor\"", "\"\\\\o\"", "\"\\\\odot\"", "\"\\\\ominus\"", "\"\\\\oplus\"", "\"\\\\oslash\"", "\"\\\\otimes\"", "\"\\\\prec\"", "\"\\\\preceq\"", "\"\\\\propto\"", "\"\\\\sim\"", "\"\\\\simeq\"", "\"\\\\sqcap\"", "\"\\\\sqcup\"", "\"\\\\sqsubset\"", "\"\\\\sqsupset\"", "\"\\\\sqsubseteq\"", "\"\\\\sqsupseteq\"", "\"\\\\star\"", "\"\\\\subset\"", "\"\\\\subseteq\"", "\"\\\\succ\"", "\"\\\\succeq\"", "\"\\\\supset\"", "\"\\\\supseteq\"", "\"\\\\uplus\"", "\"\\\\wr\"", "\"\\\\\"", "\"~>\"", "\"=>\"", "\"=<\"", "\"=|\"", "\"=\"", "\"##\"", "\"#\"", "\"^^\"", "\"^\"", "\"--\"", "\"-|\"", "\"-+->\"", "\"-\"", "\"**\"", "\"*\"", "\"++\"", "\"+\"", "\"<=>\"", "\"<:\"", "\"<=\"", "\"<\"", "\">=\"", "\">\"", "\"...\"", "\"..\"", "\"||\"", "\"|\"", "\"|-\"", "\"|=\"", "\"&&\"", "\"&\"", "\"$$\"", "\"$\"", "\"??\"", "\"%%\"", "\"%\"", "\"@@\"", "\"!!\"", "\":>\"", "\":=\"", "\"::=\"", "\"(+)\"", "\"(-)\"", "\"(.)\"", "\"(/)\"", "\"(\\\\X)\"", "\"\\\\notin\"", "\"\\\\times\"", "\"\\\\X\"", "<IDENTIFIER>", "<ProofStepLexeme>", "<ProofImplicitStepLexeme>", "<ProofStepDotLexeme>", "<BareLevelLexeme>", "<UnnumberedStepLexeme>"]
for (position, token) in enumerate(all_tokens):
if pattern in token:
print("Position %d: token '%s'" % (position, token))
@ahelwer shall we use the version of tla2tools 1.8.0? On GitHub it appears like new commits are being added to the release that was introduced 4 years ago.
If you want a quick shortcut you could also just add all the different text-based unicode variants, defined here: https://github.com/tlaplus/tlaplus-standard/blob/main/unicode/tla-unicode.csv
Although this might make it more work if any of these mappings are ever changed.
This looks like the easiest path! Thanks!
Ugh, the latest package published on oss.sonatype.org
goes back to 2020: https://oss.sonatype.org/content/repositories/snapshots/org/lamport/tla2tools/
There is no package on maven. So it seems there is no automatic way for us to consume a newer version of SANY via sbt
or mvn
. @lemmy any ideas how to proceed?
I don't have time to resurrect publishing to maven central or Github's equivalent -- PR welcome. Can't you grab the tla2tools.jar from https://github.com/tlaplus/tlaplus/releases/tag/v1.8.0?
@konnov does Apalache pull in SANY using the maven package?
@konnov does Apalache pull in SANY using the maven package?
Apalache uses sbt, which by default pulls a package from maven. We are also using sonatype as a secondary repo:
https://github.com/apalache-mc/apalache/blob/169d142ad79c9f3553eabb949d78ec01f2116f6f/build.sbt#L26
I am not sure whether Github provides their own maven-compatible repo.
Tricky thing is that I see maven has the ability to use sha1 as build version (see https://maven.apache.org/maven-ci-friendly.html) which we would need to use to distinguish the continually-updated 1.8.0 pre-release, but I can't find any details on whether various public maven repositories support that feature.
Tricky thing is that I see maven has the ability to use sha1 as build version (see https://maven.apache.org/maven-ci-friendly.html) which we would need to use to distinguish the continually-updated 1.8.0 pre-release, but I can't find any details on whether various public maven repositories support that feature.
I think this practice of updating a release file without increasing the patch number does not work well with open source software engineering practices we see on Github, Linux, etc. You could add the build number or the current date to distinguish between the builds. If you keep overwriting releases, it does not work with maven (since it caches the packages), nix, you name it. Some projects produce nightly builds and then garbage collect them, e.g., Foundry.
Well, we're operating in the Java packaging ecosystem here which I'm unfamiliar with - personally I'm a big fan of content-addressed versioning like with nix. I agree there are drawbacks to the SNAPSHOT way of doing things but it does map well to how we are currently just uploading tla2tools.jar builds to the 1.8.0 release and overwriting the current ones. What do you think we should do, @konnov?
I do think this is a problem we should solve because I like the idea of other projects being able to pull in the tla2tools.jar as a dependency. Currently there's Apalache and also tlaplus-formatter by @FedericoPonzi.
I understand it's not cool, but the tla2tools.jar file is only 4MB in size (could be trimmed). Since it's a stable dependency for Apalache (with SANY being rarely updated), the simplest solution might be to commit the jar directly to the Apalache Git repository.
Well, we're operating in the Java packaging ecosystem here which I'm unfamiliar with - personally I'm a big fan of content-addressed versioning like with nix. I agree there are drawbacks to the SNAPSHOT way of doing things but it does map well to how we are currently just uploading tla2tools.jar builds to the 1.8.0 release and overwriting the current ones. What do you think we should do, @konnov?
I do think this is a problem we should solve because I like the idea of other projects being able to pull in the tla2tools.jar as a dependency. Currently there's Apalache and also tlaplus-formatter by @FedericoPonzi.
I would also be happy, if there was a simple way to pull tla2tools.jar
as a Java dependency. I have not been working with pure Java for a long time, but afaik maven dependencies are still something that all build systems in Java understand. In this case, it would be great, if the TLA+ project could publish these jars. Actually, you don't have to do it every night. For instance, we cut releases on request with GitHub actions, see prepare-release.yml. In this case, it would be something like publishing a maven package on demand (the good thing is that you can keep sensitive data in the GitHub secrets). My main issue with contributing there is that the tlaplus
project is too big. The last time I tried, I could not even compile it in the command line.
Alternatively, we could do something like z3-turnkey.
This would be even more work but in the future I think it would be nice to split out the SANY and PlusCal parsers into separate packages. They are self-contained with minimal dependencies, and projects consuming the tools as a dependency will probably be using them alone instead of pulling in the rest of the interpreter and model-checking code.
Painful, but tla2tools.jar will from now on be published at https://oss.sonatype.org/content/repositories/snapshots/org/lamport/tla2tools/1.8.0-SNAPSHOT/
Thanks Markus! Wrangling the github CI is never a fun day.
I recommend using the new snapshot dependency from Sonatype until the version of SANY with Unicode support is fully integrated into Apalache. Once this integration is complete, we can manually tag the TLA+ Git repository and create a permanent release on Sonatype.
Thanks a lot @lemmy!
tag the TLA+ Git repository and create a permanent release on Sonatype
Does this mean cutting a final release for 1.8.0 then doing a version bump for snapshot versions? If Unicode is going to be part of that release perhaps we should consider adding ℕ, ℤ, and ℝ to the standard modules. I will create an issue to consider the best way to do that.
No, I didn't mean to suggest a full-scale tla2tools.jar and Toolbox release. What I had in mind was more like a monthly or quarterly code snapshot (not -SNAPSHOT) that would be permanently stored in the Maven repository.
Now that SANY supports parsing Unicode TLA+ specs, it should be fairly straightforward to support Unicode here by updating to a newer version of SANY. Possible bugs include if your translation code looks at actual string representation (for example,
"\\E"
instead ofTLAplusParserConstants.EXISTS
) which was the case in SANY itself.