Open ghost opened 8 years ago
ArchieML catches my heart: http://archieml.org/
A much better than YAML but still too complex: too much rules to keep in mind while using. This raises unnecessary cognitive load to unacceptable level. Perhaps it is okay for users who write everything in this format. But programmers use plenty of languages simultaneously so format of config/settings/db files for them must hit very low bound of syntactic complexity and have near zero time for jumping into workflow from far standing languages.
As for me I pretty much happy with TreeDef (renamed to ObjDef so far) at the moment; I devised bidirectional (not isomorphic) mapping between ObjDef and JSON to make tooling easier and wrote corresponding parsers/converters. Will release after some period of heavy production use.
ObjDef BNF grammar:
entry
= type value
| type contents
| type id contents
contents = "{" entry* "}"
type = adjective+
id = string
value = string | float | integer | bool
bool = "true" | "false"
ObjDef parser (full source code):
entry =
IDENT@t ->
det_parser_bug_workaround{t}@tag
adjectives@adjectives
(
'{' -> contents@@contents [quadruple(tag,adjectives,none,some(contents))]
_ -> value@value
(
'{' -> contents@@contents [quadruple(tag,adjectives,some(value),some(contents))]
_ -> [quadruple(tag,adjectives,some(value),none)]
)
)
det_parser_bug_workaround{t} = [sym(t)]
adjectives = adjectives_loop@@aa [mktuple(aa)]
adjectives_loop =
IDENT@t -> [sym(t)] adjectives_loop
_ -> []
contents =
'}' -> []
_ -> entry contents
value =
FLOAT@v -> [float(v)]
INT@v -> [int(v)]
(DOUBLE_STRING | TRIPLE_STRING)@v -> [str(v)]
TRUE -> [true]
FALSE -> [false]
parser "objdef" {
stage "objdef/parser/staged/lexer.stage" {
import "det/common/skip_until_newline.det"
filter "det/common/line_comments_token.det"
import "det/common/string_char.det"
import "det/common/double_string.det"
filter "det/common/any_double_string_token.det"
import "det/common/single_string.det"
filter "det/common/any_single_string_token.det"
;;import "det/common/triple_double.det"
;;filter "det/common/triple_double_token.det"
;;import "det/common/triple_single.det"
;;filter "det/common/triple_single_token.det"
filter "det/common/bool_token.det"
filter "det/common/digit_token.det"
parser "det/common/copy_token.det"
}
stage "objdef/parser/staged/hacky.stage" {
filter "det/common/numeric_token.det"
filter "det/common/whitespace_token.det"
parser "det/common/copy_token.det"
}
stage "objdef/parser/staged/syntax.stage" {
parser "objdef/parser/syntax.det"
}
}
(Interesting enough it still lacks of standard ordinary lists and maps; perhaps I'll add them later but not sure how)
That's how ObjDef grammar looks if yout dump it in ObjDef format:
rule "entry" {
alt {
seq {
ref "type"
ref "value"
}
seq {
ref "type"
ref "contents"
}
seq {
ref "type"
ref "id"
ref "contents"
}
}
}
rule "contents" {
seq {
lit "{"
repeat {
ref "entry"
}
lit "}"
}
}
rule "type" {
repeat {
ref "adjective"
min 1
}
}
rule "id" {
ref "string"
}
rule "value" {
alt {
ref "string"
ref "float"
ref "integer"
ref "bool"
}
}
rule "bool" {
alt { lit "true" lit "false" }
}
Nevertheless, thanks for posting the link!
I have learnt a much from your links (especially thx for the \u{deadbeef} thing, I somehow managed to completely miss the trend)
Good point about the transition among different formats. I guess ArchieML is the best combination of YAML and INI/TOML so far.
Indeed, ObjDef lacks a standard syntax for arrays and maps. But since it works best with a schema, you can just let the transformer decide the type
name, e.g.
array {
push "a"
push "b"
}
map {
set "a" 1
set "b" 2
}
however it is not bad to have a concise syntax (["a" "b"]
, {a: 1, b: 2}
). You just need to provide custom containers for them.
I think exactly the same. Just no urge to add them right how. When I'll met circumstances in which avoiding lists/maps can't be prolonged, I'll take final decision.
I learnt a much from your links (especially thx for the \u{deadbeef} thing, I somehow managed to completely miss the trend)
:-) Now I think \xXX
is still the best for bytes, and \u{XXX,XXXXX,XXXX,...}
for unicode.
\uXXXX
and \uXXXXXXXX
are just bad.
Agreed! =)
A good use of semicolons in https://github.com/CESNET/libyang may interests you, and my example:
array { 1; 2; 3; 4 }
map { a 1; b 2; c 3; d 4 }
book {
title "The Poor Jack";
description "Long long ago..."
"There was a poor actor called Jack."
"Guess what happened on him?";
price 9.99
}
it is just LISt Processing :-P
Another one, SimpleDeclarativeLanguage: https://sdlang.org/
Similar to XML but can associate multiple values to a tag. E.g.
// [tagname] [values] [atributes] { [children] }
tag val1 val2 val3 attr1=sth1 attr2=sth2 {
child { ... }
child { ... }
}
and anonymous tag (default name: content)
matrix {
0 0 1 //= content 0 0 1 {}
0 1 0
1 0 0
}
Thanks! Interesting...
SimpleDeclarativeLanguage:
arbitrary semicolon is interesting design desicion though
What do you think about lambdas and include directives? (In context of configuration files) They may help to cope with patterns and bring in the DRY.
I peek that idea at Nix Exprs design https://medium.com/@MrJamesFisher/nix-by-example-a0063a1a4c55
It seems as overkill at first moment you think about it but hey, configuration files are formulas written in formal languages just as ordinary source codes are but with "declarative" semantics (declarative language semantics is a definition of function from syntax domain to value domain while ordinary programming language semantics is a definition of function from syntax domain to function from value domain (input) to value domain (output), not too distant as it may seem).
Also, configuration files bring in all the mess like dependencies management, deployment, versioning, using-from-external-project, ... as ordinary source code do. So they may benefit from type systems, module systems and generic dependency handling too. While type systems for configuration files are common, include directives and module systems are not (yet?)
So every configuration expression is just a program accepting one argument of singleton type "Unit".
Interesting enough, if you take into account considerations from "12 factor app" thing, you end up with statement "configuration expression is program with argument env :: String->String"
Moreover, if you scrutinize difference between "configuration" and "settings" you notice that configuration file is just (highest) toplevel source code composing all (more or less) generic (library) parts together, edited at customization/maintenance time (not runtime.) so it also static resource.
Interesting consequence of that point of view that one may employ not only typechecking to verify new configuration is correct but also perform (taking significant time) abstract interpretation-like postcompilation step to make program optimized for this very specific configured use case.
Also note that since Nix-expr-like configuration language is not Turing-complete it may be verified 100% correct even without type signatures, and all safety theorems autoproved (because Rice Theorem and Halting Problem do not hold for such a weak language)
Style 1:
pc1(i) = ... # Program Component #1
pc2(i) = ... # Program Component #2
parse_cf(fp) = readfile(fp) == "variant1"
main(i) = if parse_cf("filepath.cfg") then pc1(i) else pc2(i)
Style 2:
pc1(i) = ... # Program Component #1
pc2(i) = ... # Program Component #2
parse_cf(fp) = if readfile(fp) == "variant1" then pc1 else pc2
main(i) = parse_cf("filepath.cfg")(i)
Only the application can tell what is the appropriate methods to include files/texts. We don't even need a standard syntax for it. E.g. (YAML)
# include/inherit/…whatever
template: [user encrypted]
# patch/override
user:
name: Bob
encryption:
key: bob.key
Lambdas or functions are convenient too. But it makes the watershed between "data" and "script/program".
To make the parser simpler and easier to implement, without creating another Turing-complete language, I would prefer simple transformers (implementation-defined):
duration1 = 1000 # milliseconds
duration2 = (10).seconds # not directly parsing string "10 seconds"
to make it declarative like the "recursive sets" in nix:
# key = [variable | value] [.[transformer] | [binary-op] [variable | value]]...
duration2 = (1).seconds + (duration * 2) # lazy substitution, instead of yaml-like reference
and without user-defined lambdas like func1 = .seconds + (10).minutes
.
Also note that since Nix-expr-like configuration language is not Turing-complete it may be verified 100% correct even without type signatures with all safety theorems autoproved (because Rice Theorem and Halting Problem do not hold for such a weak language)
Really not Turing-complete? Nix looks like a weaker Tcl language. It is overkill for data-centric configuration.
Style 1:
pc1(i) = ... # Program Component #1 pc2(i) = ... # Program Component #2 parse_cf(fp) = readfile(fp) == "variant1" main(i) = if parse_cf("filepath.cfg") then pc1(i) else pc2(i)
Style 2:
pc1(i) = ... # Program Component #1 pc2(i) = ... # Program Component #2 parse_cf(fp) = if readfile(fp) == "variant1" then pc1 else pc2 main(i) = parse_cf("filepath.cfg")(i)
Seems that we are talking about the same thing. But user-defined functions are still concerning me. I need to think more about it.
Basically, I doubt whether it is worth the effort to create another scripting language for advanced configuration. Everything like that may become just another very specific DSL.
| Really not Turing-complete? Nix looks like a weaker Tcl language.
I plan to ban recursion and recursive let definition, that's result in surely not Turing Complete lang.
| It is overkill for data-centric configuration. Yes! Plenty of apps need no more than bare old ini-files. But! We have continuum of configuration file use cases, and at opposite direction we see quite convoluted configurations, see build files for example. So as I can guess we are all be in profit after choosing weak enough language to cover all the cases while maintaining simplicity at most simple uses.
| Everything like that may become just another very specific DSL.
It is language for record manipulation for generic configuration purposes, so in one sense it is very narrow, but in other sense, as configuration language, it is very broadly applicable. So it is not so straightforwardly "very specific" as it seems.
| Only the application can tell what is the appropriate methods to include files/texts
We can lift file inclusion to generic, configuration language level, and track dependencies/loading automatically in the library.
Look. You've written program A and B. They needs its own configuration files. I'm building enormous software complex with one instance of A and two instances of B included. Inclusion/lambdas will be overkill for you, but must-have for me in this case.
| Basically, I doubt whether it is worth the effort to create another scripting language for advanced configuration. Everything like that may become just another very specific DSL.
I have a bunch of tools that make creation of simple DSL a breese. No effort at all.
All effort is in decent design.
| But user-defined functions are still concerning me
After ban of any recursion and while loops, user-defined funtions become safe.
(Foreaches are ok)
| I would prefer simple transformers (implementation-defined)
Thing that bothers me is that in SOME cases you end up with convoluted intangible mess of transformers. Why not cover these cases too if we can do it without making language clumsier?
My first shot on grammar:
expr1 = lambda | expr2
lambda = pat ":" stmt+
stmt = if_stmt | let_stmt | expr1
if_stmt = "if" expr "then" stmt+ "else" stmt+
let_stmt = pat "=" expr
expr2 = infix | let_expr | expr3
infix = expr2 infix_name expr2
let_expr = "let" stmt+ "in" expr2
expr3 = app | expr4
app = expr3 expr4
expr4 = getfield_expr | expr5
getfield_expr = expr4 "." name
expr5 = scalar | name | "(" expr1 ")" | list_expr | record_expr
record_expr = "{" (setfield_expr ";")* setfield_expr? "}"
setfield_expr = name ("=" expr1)?
list_expr = "[" (expr1 ";")* expr1? "]"
pat = name | at_pat | record_pat | list_pat
at_pat = name "@" pat
record_pat = "{" (field_pat ",")* field_pat? "}"
field_pat = name ("=" pat)? | name "?" expr1
list_pat = "[" (pat ",")* pat? "]"
(I suppose working codename for the lang is "Top", as for "most toplevel layer of app")
(there are problem with lambda priority, I'm thinking about it)
For example, configuration language like this is powerful enough to describe Feed-forward Convolution Neural Network architecture. In more ordinary configuration languages (like ObjDef/TreeDef) that looks like total mess. Utterly unreadable.
(Side note: I'm not in any way adocating for something, I just want to follow the Truth; so if I (while doing research) see the trails of Truth make a tight turn so do I)
Never mind. And I want to know the balance point, even if it leads to one minimal and one more advanced minimal designs.
Your use case is broader than mine, I just need some time to digest the information. :-P
I just want to get a "Final Solution of the Configuration Files Question" =)
I hate it when problems keep reiterating after once solved. We just never be able to make step forward if we spent all the time on reiterated problems.
Your grammar above is full of recursive definitions, I don't quite get it. Is it mimicking Nix?
This is subset of Nix Expr language except:
Grammar without stmt stuff:
expr1 = lambda | expr2
lambda = pat ":" expr1
expr2 = let_expr | expr3
let_expr = "let" (pat "=" expr1 ";")* (pat "=" expr1)? "in" expr2
expr3 = infix | expr4
infix = expr3 infix_name expr3
expr4 = app | expr5
app = expr4 expr5
expr5 = getfield_expr | expr6
getfield_expr = expr5 "." name
expr6 = scalar | name | "(" expr1 ")" | list_expr | record_expr
record_expr = "{" (setfield_expr ";")* setfield_expr? "}"
setfield_expr = name ("=" expr1)?
list_expr = "[" (expr1 ";")* expr1? "]"
pat = name | at_pat | record_pat | list_pat
at_pat = name "@" pat
record_pat = "{" (field_pat ",")* field_pat? "}"
field_pat = name ("=" pat)? | name "?" expr1
list_pat = "[" (pat ",")* pat? "]"
Stmt thing was for writing
if f1 then
a = b
instead of
let a = if f1 then b else a in
I finally think this feature must be dropped.
and perhaps change
setfield_expr = name ("=" expr1)?
to
setfield_expr = (name ".")* name ("=" expr1)?
Useful link https://github.com/jwiegley/hnix.git
Update: import expr, with expr and index expr added.
expr1 = lambda | let_expr | expr2
lambda = pat ":" expr1
let_expr = "let" (pat "=" expr1 ";")* (pat "=" expr1)? "in" expr2
expr2 = with_expr | expr3
with_expr = "with" expr3 ";" expr2
expr3 = infix | expr4
infix = expr3 infix_name expr3
expr4 = app | expr5
app = expr4 expr5
expr5 = getfield_expr | index_expr | expr6
getfield_expr = expr5 "." name
index_expr = expr5 "." "[" expr1 "]"
expr6 = scalar | name | "(" expr1 ")" | list_expr | record_expr | import_expr
record_expr = "{" (setfield_expr ";")* setfield_expr? "}"
setfield_expr = (name ".")* name ("=" expr1)?
list_expr = "[" (expr1 ";")* expr1? "]"
import_expr = "<" path ">"
pat = name | at_pat | record_pat | list_pat
at_pat = name "@" pat
record_pat = "{" (field_pat ",")* field_pat? "}"
field_pat = name ("=" pat)? | name "?" expr1
list_pat = "[" (pat ",")* pat? "]"
Translated some ObjDef/TreeDef configurations in Top form to get closer look-and-feel https://gist.github.com/vagoff/958c54567b9b150059ca2ed185848ea5
{stage,import,filter,parser}: # import entry constructors
That's indeed a good improvement. The interface is clear and in the forefront now.
| I don't like the syntax of Nix-lang
Some particular points/features or it is inexplicable overall feeling? Do we have possibility to make it more beautiful?
| spaghetti
Yes, this is a serious drawback :/
| Why not JS?
Because JS is Turing-complete so we have no any guarantees with it. Nix(Top) is very weak so it is possible not only guarantee 100% safety but also infer upper and lower bounds on number of execution "steps".
| just looks like JavaScript spaghetti.
Something to add:
Amount of code and cognitive burden/easiness is vastly different notions. I learnt that hard way, through Haskell expierence and reading source codes of Ur/Web compiler. Very large code sometimes may be a MUCH more accessible than some convoluted tiny masterpiece built on top of high theoretical notions. Also it is easier to debug, maintain and extend.
Especially the pattern matching part. The syntax being too compact makes abuses of advanced features easier, and reduces readability. Regarding "pure", some imperative instructions will become horrible cascading of let ... in ...
, even if there are lambdas (but there are no pre-declarations of functions).
So people should know what should not be done solely with those features. Like the Nix project, the actual build jobs (mostly?) are delegated to the posix shell and makefiles. I don't think this language can take the place of make/cmake/premake/...
A good example for styling is MacPorts: using tcl (everything is a string ;-) and imperative style (and impure), but not much fancy stuff. With good styling + auxiliary functions (with an imperative style) it is possible to make Nix/Top look nice.
https://github.com/go-ini/ini
though I'm not dealing with config files these days… :stuck_out_tongue_closed_eyes: