digama0 / mm0

Metamath Zero specification language
Creative Commons Zero v1.0 Universal
308 stars 40 forks source link

mm0 for kotlin #78

Open Lakedaemon opened 3 years ago

Lakedaemon commented 3 years ago

Hello,

In the peano examples. Is it normal that term al has type wff () in mmu : (term al ((x nat) (ph wff (x))) (wff ())) and type wff (x) in mm0 : term al {x: nat} (p: wff x): wff;

Is a proofchecker supposed to let that go or not ? and the variable names are also different p != ph, is it supposed to let that go ?

In which case, how does the proofchecker decide that a mmu directive corresponds to a mm0 statement ?

I get that the mmu file is the source of truth and that binders are actually used with respects to the order they have in the mmu file and that the mm0 file is just supposed to reflect how variables are used in the formula but present it to the human reader in a eye-pleasing manner (for example (x y z : nat) (ph:wff) instead of (x:nat) (ph : wff) (z:nat) (y:nat) and that order is not that important in the mm0 file.

All that because, the computer will do the proof building dirty work behind the scene and it knows what it needs to do it.

But shouldn't the names be the same ? (for terms, that might not be that important, maybe that is a tolerance for them)

Lakedaemon commented 3 years ago

Yes, this is precisely my point. If the example you cite is allowed to exist, I'll cry !

A consequence of the two pass approach is that notations may be used before they are defined in the file`

This means that when mm0/mmu/mmb becomes popular (because if you and/or I succeed, it will), there are going to be mm0 files flying around written by people who use 2 pass proof-checkers and those mm0 files will break your proof-checker and mine, because we both implement 1 pass proof checkers :)

And they will say that we suck because our proof checkers do not even work though theirs do :D

Maybee I do not understand what you say (my brain doesn't behave sometimes). To be clear, I also want developpers to have the choice of implementing 1 or 2 pass proof-checkers (freedom !!!). But I want neither of them to break with the files that the spec says is ok.

Lakedaemon commented 3 years ago

maybee I do not understand what you wrote. Because you say : Usage before definition is not okay (yayyyyy, I want this too) But I understand this sentence A consequence of the two pass approach is that notations may be used before they are defined in the file as "usage before definition is allowed (and I'm crying really hard now :( )

Lakedaemon commented 3 years ago

Also, I do not remember reading in the spec that "garbage in a formula makes the proofChecking process fail even if the dynamic parser returned a tree".

If I do not presume, I would love that to be written in the spec.

digama0 commented 3 years ago

A consequence of the two pass approach is that notations may be used before they are defined in the file`

This means that when mm0/mmu/mmb becomes popular (because if you and/or I succeed, it will), there are going to be mm0 files flying around written by people who use 2 pass proof-checkers and those mm0 files will break your proof-checker and mine, because we both implement 1 pass proof checkers :)

That's true. Conversely, if they want to write mm0 files that are broadly checkable, they need to follow the strictest guidelines, which means they can't use conditionally supported features. Anything that is accepted by some checkers and not others should be viewed with suspicion. But it's not always practical to require verifiers to precisely reject everything, and in this case it doesn't matter much. You should just reject such files and move on.

By the way, the reason you might want the 2 pass style is if you have used a parser generator to construct the math parser. There are a lot of parser generators that expect the grammar up front and don't allow the parser to be dynamically extended. So if you first get all the notations and turn them into a BNF description and hand them to yacc or something, you will get a parser that parses all math in the file equally. Given a parsed expression it is then difficult to tell whether you have used a notation from "the future", although if you use a term from the future this is more obvious.

Also, I do not remember reading in the spec that "garbage in a formula makes the proofChecking process fail even if the dynamic parser returned a tree".

If I do not presume, I would love that to be written in the spec.

The expression does not parse. The dynamic parser is required to parse the entire string. There is nothing in the spec that says you can add additional text after the end, the expr(p) nonterminal is being matched against the content of $ stuff... $ (excluding the $ delimiters), not against a prefix of the stuff....

Lakedaemon commented 3 years ago

A consequence of the two pass approach is that notations may be used before they are defined in the file` This means that when mm0/mmu/mmb becomes popular (because if you and/or I succeed, it will), there are going to be mm0 files flying around written by people who use 2 pass proof-checkers and those mm0 files will break your proof-checker and mine, because we both implement 1 pass proof checkers :)

That's true. Conversely, if they want to write mm0 files that are broadly checkable, they need to follow the strictest guidelines, which means they can't use conditionally supported features. Anything that is accepted by some checkers and not others should be viewed with suspicion. But it's not always practical to require verifiers to precisely reject everything, and in this case it doesn't matter much. You should just reject such files and move on.

I see. It buggs me less now, thanks to your explanation

By the way, the reason you might want the 2 pass style is if you have used a parser generator to construct the math parser. There are a lot of parser generators that expect the grammar up front and don't allow the parser to be dynamically extended. So if you first get all the notations and turn them into a BNF description and hand them to yacc or something, you will get a parser that parses all math in the file equally. Given a parsed expression it is then difficult to tell whether you have used a notation from "the future", although if you use a term from the future this is more obvious.

Having generated a parser for Metamath, I see what you mean. But throwing away parser generator and embracing the dynamic Parser is one of the 2 main sellign points of mm0. Who would want to do that ?

They might as well use the slow, cumbersome and unmaintainable parser I generated for Metamath like 2 years ago. It is doable, but it means throwing away all the benefits (and the simplicity) of mm0.

Because, when I had the metamath parser, I had like 100 grammar rules :/ Throwing away mm and embracing mm0, on of the best decisions I took. (though I have still not regained the level of math I achieved with metamath....but soon)

Also, having people write strict mm0 because they hope to have universally useful files does not achieve the same results that making people write strict mm0, because it is not valid otherwise (yet, I respect your decision, I understand now thanks to your explanations).

I just do not see benefits in having this tolerance and lots of pain later (with strict mm0, you can do everything that not-strict mm0 can and more)

But, I will abide by any decision you take (also, I already have a solution : take their weird files, rewrite them in strict mode, done....it gives me more work, but if I can patch set.mm, I can shuffle around notations and simple notations, on one leg, with a blindfold, arm tied, while singing)

But please consider history and why there is a Strict mode for html. :)

Also, I do not remember reading in the spec that "garbage in a formula makes the proofChecking process fail even if the dynamic parser returned a tree". If I do not presume, I would love that to be written in the spec.

The expression does not parse. The dynamic parser is required to parse the entire string.

YES ! I remember reading that. Good !

There is nothing in the spec that says you can add additional text after the end, the expr(p) nonterminal is being matched against the content of $ stuff... $ (excluding the $ delimiters), not against a prefix of the stuff....

digama0 commented 3 years ago

But throwing away parser generator and embracing the dynamic Parser is one of the 2 main sellign points of mm0. Who would want to do that ?

Well you pretty much have to write the dynamic parser by hand, because it doesn't exactly fit most parser architectures. It's not huge but it's sometimes nice to pull a parser off the shelf instead of writing your own.

Actually, metamath is broadly the same as regards dynamic parsing. You can preprocess the file to construct a CFG if you like, or you can build the parser dynamically. In metamath you definitely can't use a syntax before the syntax axiom is introduced, so you have the same issues (although the term constructor and the syntax are the same so there isn't any danger of accidentally using notations from the future without also using term constructors from the future).

I just do not see benefits in having this tolerance and lots of pain later (with strict mm0, you can do everything that not-strict mm0 can and more)

Yes, but if the definition of "strict mm0" is itself difficult to check then that can mean a meaningless performance penalty. For example, there are a few things in the MMB files that are not checked for validity, because you can still verify the proof even if the data in the file is slightly off. The verifier is not supposed to be a validator, odd as that may sound. It's supposed to verify the well formedness and the truth of the theorems in the file. If the theorems are stated in some unusual way that the verifier is nevertheless able to make sense of, that should still be okay.

The most important thing that should be in fail tests are false proofs. If the proof doesn't follow the rules, then it needs to be rejected. That would be the negation of "loose mm0". But there is a gap between loose mm0 and strict mm0 in order to give some flexibility for verifiers to implement things as conveniently and efficiently as possible. There will be "strict mode mm0" validators, but I do not want to mandate that all conforming mm0 verifiers are such.

By the way, another example of a gap I'm considering is alternative whitespace and unicode identifiers. Sometimes, for implementation simplicity, you want to use an is_whitespace function from the standard library that thinks that \r and \t should be whitespace, and similarly perhaps you want to have french theorem names or something. In these cases "strict mm0" says no, but a few foreign names doesn't make the file unintelligible as a collection of theorems, so if the verifier is willing to deal with it then there isn't any need to require that the verifier reject.

Lakedaemon commented 3 years ago

Wow. It is possible to build a dynamic parser with metamath ? (you say it, so it must be true, wow) You musr mean metamath the formalism with the few rules

yet a metamath dynamic parser with set.mm (yuckkkk, it gives me goose bumps.... )

Also, thanks for the insights (you are quite the theorist...I'm amazed at the depth of thoughts you put in your creation)

I was considering the unicode identifiers also (for operators, sum, arrows....). It would make reading textual mm0 slightly cuter but it would open a whole new can of worms. And degrade performance. Also, graphical rendering is going to be donne by TeX/MathJax on my side, so unicode wouldn't bring anything to the table (ans unicode is hard to input).

Unicode makes sense for human languages though. Yet, there is still the possibility to map unicode ids to ascii ones, so it may be done by alother layer on top of an ascii- mm0 powered engine. So, for now, I'm sticking with good old ascii :/ (though I'm all for utf-8/unicode/multi-language stuff (even wrote a multi-lingual Japanese dictionary supporting 50 languages), that says something... Who would have thought that I would write an ascii only software...that is so 80-ties...)

false proofs... I'm not sure if I'll be good at building that. 28+ years of maths makes me pretty much only able to write proofs (force of the habit). Maybe I should ask my students for help :)

Lakedaemon commented 3 years ago

The "tests I'm writing" right now look more like Unit tests, making sure the developpers make their job implementing things on top of your specs.

But if those tests ensure that verifyers are implemented correctly, maybe the strength of your formalism will ensure that it is not possible to write false proof (I'm naive and an optimist, maybe).

digama0 commented 3 years ago

false proofs... I'm not sure if I'll be good at building that. 28+ years of maths makes me pretty much only able to write proofs (force of the habit). Maybe I should ask my students for help :)

In theory, "code coverage tests" should help here. That is, every time the program has to check something and fail otherwise, it should be possible to construct an input that hits that check. There are 88 uses of ENSURE(...) in mm0-c/verifier.c, so that might give you a place to start (or the analogous checks in your verifier).

Lakedaemon commented 3 years ago

I think that I got the "false proof" stuff.

false proof test = design a test code, so that if a proof checker doesn't respect an aspect of the mm0/mmu spec, then it is possible to prove something false

Basically, such a test would prove that your requirements are necessary (on the theorical side). But such exploits are much more difficult to create than basic unit tests that make sure that the single requirement is met I'm not sure that it would add a layer of security/accuracy over a proof checker.

Writing tests makes me look again/harder at the different specs, which is a very good thing for my software (making it go from the "somehow working" to the "mostly working" state)

I discovered that mmu can have line comments ! This is nice for tests because I'll be able to explain to test consummers why they should fail or pass (with quotation from the spec)

I'll be implementing line comment support in my mmu parser and next pr-ing my first tests

I also sanitized the names of the test to allow easy navigation like ids cannot .... ids cannot .... formulas cannot ...

The folders can be used to test different parts of the mm0/mmu toolchain (mm0 parser, mmu parser or proofchecker at different stage : matching, registering, dynamic parsing, proof checking)

digama0 commented 3 years ago

Basically, such a test would prove that your requirements are necessary (on the theorical side). But such exploits are much more difficult to create than basic unit tests that make sure that the single requirement is met I'm not sure that it would add a layer of security/accuracy over a proof checker.

Yes that's right. You don't have to create an actual "exploit"; exploit tests are generally best written against a specific verifier with a bug in it, to demonstrate that the bug is in fact a soundness hole. For general testing it's simpler to just check all the primitives that can potentially be used in an exploit for general robustness. It's not perfect, but that's just a limitation of testing.

By the way it's also okay to have multiple tests in a single file. Basic parsing tests should be short and focused but for more high level tests like secondary parsing or binder order stuff I think it's fine if you have 4 to 10 individual tests in one file. The whole file fails as one so you probably don't want to put too much in the file, but I think it's organizationally easier not to have thousands of (pairs of) two line files.