dobkeratops / compiler

C/C++ subset resyntaxed like Rust,+ tagged-union/Pattern-Matching, UFCS,inference; LLVM . example:
https://github.com/dobkeratops/compiler/blob/master/example.rs
GNU General Public License v3.0
72 stars 7 forks source link

Syntax idea #2

Open ozra opened 9 years ago

ozra commented 9 years ago

Sorry if I'm intruding this space with possibly OT but..

I thought I'd get your input on an idea: I was thinking of another way of typing:

do-stuff(par_a = int, par_b = double 0.7) -> int =
    foo = double par_a * par_b
    bar = par_a * 2
    mut baz = int
    baz = bar / 3
    round a * b
ID "=" ( TYPE EXPR | TYPE | EXPR )

Do you see anything that could conflict with such a syntax? It seems to me to be very intuitive. Think about objs of classes..

C++ stylee

SomeClass foo = SomeClass(47, "Stuff");

vs.

foo = SomeClass 47, "Stuff"

This is basically, C++ with the auto keyword implicitly entered (except it would ofc. work in classes too), and then ofcourse with LiveScript call syntax assumed (ID ID, ID ==> ID(ID, ID) - if you get what I mean.)

Thoughts?

dobkeratops commented 9 years ago

Any ideas are very welcome

Side note -I've now implemented trailing "=" for single-expression functions, e.g. fn lerp(a,b,f)=(b-a)*f+a - and added haskell-like trailing expr where { ...more.. } extends that further. The single expression case cuts down on braces of course.

ID "=" ( TYPE EXPR | TYPE | EXPR )

ok the problem I can see is I'm currently parsing all of these - argname:type=default_expr argname:type argname=default_expr. I have one parse & AST entity shared between 'Argument definitions" and "struct field" definitions which both work the same way;

EDIT ahh, i just see you've suggested an alternate way of writing default args par_b=double 0.7, interesting.

I personally think default-expressions are incredibly useful, and would want to keep this handy syntax for them. Rust doesn't have them yet .. its been a big point of debate in their community, many are opposed to defaults..

I haven't implemented use of default expressions yet - but I definitely want them, along with 'named' parameter invocation just like Python & Scala. (does a similar job to mix-fix function names). If I implement currying, I would simply make defaults terminate the ability to curry. (e.g.: i think you'd want to curry maths functions, and defaults make no sense for them, whereas GUI constructors would probably benefit way more from defaults, and you'd just write a manual lambda instead of currying them)

another side note: I had a temporary syntax varname=:type which just creates a var of 'type'. I needed this as a hack for declaring uninitialised variables. Its still in there, but I've added Rust-like "let".. e.g. let varname:type.

The =: operator might have some uses elsewhere so i'll keep it around.

my language currently uses Rusts' idea that types are in a different parsing context to expressions, and ':' is usually what introduces the type context. This also allows for very easy syntax highlighting... a regex can guess where types appear more easily.

But the main benefit is: the typeparam brackets. <T> (+ [T] in my case) are always unambiguously Type Parameters inside the type context, and unambiguously "less/greater" (or subscript-operator) in the expression context. this is one aspect of Rust I'm really keen on ... after all those years of being stuck with hugely ambiguous C++ syntax.. cases like a<b>c() where you can't produce the AST until you have the symbol table

Rust currently uses :: in the expression context to re-introduce type parameters, but rarely needs it because its' inference is so good. (i.e you usually only need to write them in the function signature)

They also plan to use :T within expressions as 'type annotations' anywhere .. they had considered replacing the 'as T' coercion operator, but the few times when they need to do :: , a trailing :... might get rid of it and be more consistent

SomeClass foo = SomeClass(47, "Stuff)" foo = SomeClass 47, "Stuff"

if the 'function call' operator is there... expr(...) - it should be possible to always figure out that the preceding identifier is a type name; i'll see how it goes - currently i don't actually have constructors , but I do have ident{...} for initialising a struct. Its possible it might just work already that if I make a function that just has the same name (returning the type...), it'll pick the function. Inference will cut down on repetition...

fn Foo(x,y)=Foo{ field1=......, field2=....... } ok it still has to repeat the name once, but the first time its' a function and the second its' a type. This'll work really nicely with 'where' notation, I'm glad I added that :)

Where I might diverge from rust: Rust uses separate namespaces for functions,types,..; I think I'd like one namespace, but just allow 'using the same identifier as a function or a type' to make constructors.

Perhaps if I eventually get 'significant whitespace mode' it'll cut down on the general noise of commas, and ":"s. That's definitely something i'm open too, I think it can be implemented purely in the lexer and the information would be useful for warnings too. There are already some places where omitting ";" works 'accidentally' .. Perhaps I can add an abstraction to the lexer to "expect_comma()","expect_semicolon()" or 'expect_param_seperator()' 'expect_statement_separator()' which just toggles off in 'significant whitespace' mode, and the parser wont know any different..

maybe it'll will already "just work" already if commas are omitted optionally.. then you'd write

do-stuff(par_a:int par_b:double=0.7 )-> int =... you pay the ":" and "=" but spare the ","

ozra commented 9 years ago

If I implement currying, I would simply make defaults terminate the ability to curry. (e.g.: i think you'd want to curry maths functions, and defaults make no sense for them, whereas GUI constructors would

Great idea! Both "theoretically consistent" and very practically oriented - the way I like it. You hit the nail on the head with the "math vs gui" case. This is why I'd love a 'functionally oriented OO-capable imperative language'. Because some parts of coding is like math (functional goodie), but many times, it's more like engineering (imperative goodie).

my language currently uses Rusts' idea that types are in a different parsing context to expressions, and ':' is usually what introduces the type context. This also allows for very easy syntax highlighting... a regex can guess where types appear more easily.

I do realize that. When thinking about syntax, my priority and perspective is always human readability, clarity of intent, and consistency - never 'easily parsable grammar for the compiler', haha. My view is that the developer should have as powerful and clear a language as possible, and the compiler (aka computer) does all the heavy lifting. But in any case, I think you have a point about separating type and expressions any way. My idea was that saying a = int, "a is an int variable" is logical, and a = 5, "a is 5" of course, and a = int 5, "a is the int 5', would also be logical and more specific. So there: an undefined variable, defined only to type, a value defined "variable" (considering mutable vs constant), with auto deduced type, and a "variable" (again dep. on mut/const) defined to value and type specifically. And as you noted, the syntax would be the same for default arguments, mirroring variable/constant declaration. Still, as you say, it may cause templating ambiguition.

do-stuff(par_a:int par_b:double=0.7 )-> int =... you pay the ":" and "=" but spare the ","

The commas is one of the things I actually am ok with, I find that they are actually clarifying and should only be omitted if replaced with a newline.

Where I might diverge from rust: Rust uses separate namespaces for functions,types,..; I think I'd like one namespace, but just allow 'using the same identifier as a function or a type' to make constructors.

This is an interesting aspect, I've thought some about it, but haven't really arrived at a final conclusion. I think your idea of having them in the same could make reading code a bit clearer. However, it would clash with your constructor idea, no?

dobkeratops commented 9 years ago

Since this post my function argument parsing has changed a little - I've implemented Rust- style pattern matching, so the argument is now in a "pattern" context - which is different again e.g. X..Y defines a range test, A|B separates options. it produces a different set of AST nodes to expressions. maybe I can get rid of that, and just make it an expression, read by the pattern ? .. but (i) not every operator makes sense in expressions and (ii) it has been convenient to setup different nodes which know how to compile themselves differently.

(You've got a fork of my source? I'd recommend updating if you are curious since the whole project structure has changed a lot, I've tried to separate things out more.)

I'm beginning to think Rusts' 'alternate contexts' are a bit of a double edge sword as I would like to be able to parse a whole tree just from the operators having consistent meanings, then figure out " ah this is a type, interpret this tree this way ..." .

But the Template Parameter issue remains.

also actually having Rusts' matching now makes it seem much more feasible that this could grow into a 'superset of Rust' aswell. Everything I want is theoreticaly possible as a merge of both languages... 2 sticking points are macros* and its 'module system'. I don't have any namespacing yet except just crudely sticking methods inside structs. Rust emphasises namespacing whilst ditching overloading.. they put every file in its own namespace by default, whereas the way i'm used to working with overloading and some C++ methods,that's overkill.

On the typeparam front, I've started to experiment with a little lookahead - I realise now that semicolons and braces help C++ deal with it. This lookahead is purely in the lexer. There I've added brace counting, (and started to add preceding whitespace count). So when it hits a "<", the rule is , "scan for the next ">" on the same line, if there's no ';' or bracket-level changes, interpret it as a typeparam."

This would still be wrong here: foo(a<b, c>d) << it would falsely think its' a type-param. you will have to enclose the (a<b),c>d to disambiguate. I'd accept that, to get of the one ugly piece of rust syntax - ::<T> in the expression context.

I read that C++ usually uses 'GLR' parsing which means maintaining multiple parallel parser states when ambiguity arises, building alternate possible trees, joining them in a DAG.. fascinating but I can't start something like , i'll divide my time too thin.

What I might try is just hard-coding popular Rust macros, in order to compile rust source. e.g. they use vec![...] for a vector constructor, format!(...) for strings, assert!() etc, and just make it skip macro definitions

dobkeratops commented 9 years ago

This is an interesting aspect, I've thought some about it, but haven't really arrived at a final conclusion. I think your idea of having them in the same could make reading code a bit clearer. However, it would clash with your constructor idea, no?

so currently rust allows you to define fn foo() struct foo{} and they're too different things, and usage foo(args) and foo{initialisers..} is unambiguous. but their naming convention is to make constructors 'in the namespace of the struct' e.g. foo::new().

perhaps I should indeed keep structs/functions in separate namespaces, and just support a 'c++ style operator new' aswell. Currently what my code does is: new StructName{....} will allocate the struct,then call the struct initialiser, whereas StructName{..} just creates on the stack.

I haven't yet implemented C++ style constructor functions yet, e.g. new StructName(args). but I definitely want to. what it can do now is 'fn make_foo()=new Foo{...}', but thats' not as versatile as C++ where you can choose to initialise on stack or heap.

My code is structured to allow 'new' to treat its argument differently, e.g. so I will be able to handle new [ , , , ..] to allocate a dynamic array with initialisers too.

So... basically I just have to implement both.

ozra commented 9 years ago

One bonus of sharing namespace for types and other symbols is that more can be deduced from the identifiers if one wants to strip down on requirements in the syntax. For things like the foo(a<b, c>d) case. However it does require joint lexing-parsing etc. And then it can most likely be deduced in most any case anyway, so perhaps a moot point.

It's stack always atm? Heap alloc is not in at all?

ozra commented 9 years ago

Oh, 'bout the macros. I think you on the right track opting for language constructs over macro. Not saying that macros shouldn't be avail in the long run - they do have they're uses, but still, I think you making a wise choice sparing that time and just taking care of the common cases from Rust as a special ops.

ozra commented 9 years ago

(You've got a fork of my source? I'd recommend updating if you are curious since the whole project structure has changed a lot, I've tried to separate things out more.)

Ah, yes, I'm unfortunately over my head with my current projects (and chopping the wood for the stove - life stuff..). When I get a bit of air to breath, I'll catch up again. :)

dobkeratops commented 9 years ago

It's stack always atm? Heap alloc is not in at all?

  • heap alloc is there - 'let x=new Struct{...}' does that, whereas 'let x=Struct{..}' is stack as you'd expect. What I can't yet do is write a C++ style constructor called for either.. thats a pretty serious omission for C++ comatability i know. its only available for the inbuilt 'struct-initializer form' with positional or named fields.

The framework is in place to generalise it though. there's a method for anything to be compiled differently in an 'operator new' slot (a struct-initializer, a function call, an array initialiser...) and a 'hidden parameter pass' (which has other uses e.g. how 'match arms' compile). and of course one can write a function that allocates and initialises a struct and returns it , and that 'does the job of a constructor'.

i still have to think about how to handle the difference between C++ new and Rust 'box' , and the rust community intends to let 'box' do the same job as 'emplace back' which is really interesting ('placement new on steroids' perhaps). box returns a smart pointer by default. maybe it could be done like 'new<X,T>'.. where X defaults to raw_ptr and rust 'box' is just like 'new< unique_ptr, T> .. to create a superset of what C++ & Rust do.

However it does require joint lexing-parsing etc.

yeah this is something I view as a C++ disaster - because you have to have the whole context to make sense of a fragment of code, and the include order matters. With [] brackets for typeparams it might be easier because at least the shape of the AST is known. But I'd be happy to use the parser hack I mention and just require those corner cases to be disambiguated with parens. I don't think i'll attempt 'GLR' parsing where it keeps multiple potential AST shapes around..

I wish C++ could gain some keywords to disambiguate fragments , to help out-of-order or partial parsing, e.g. just like you can forward declare a struct, imagine if you could forward declare that something is a template (so it knows params follow it) or typename, (just allow the typename keyword at global scope?)

dobkeratops commented 9 years ago

It's stack always atm? Heap alloc is not in at all?

ok now it supports operator new with a constructor aswell. EDIT - and constructors for stack variables too, invoked via let x=Foo(args..) . it just makes wrapper functions. getting there.. (both constructors and struct-intializer form supported now).

I don't have an equivalent of the C++ syntax Foo x(args); not sure how to retrofit that, and it can probably be avoided. its' clearer that it happens 'inplace' of course but it should be safe to rely on compiler optimisation these days.. and its' also a throwback to not having auto (you didn't want to write Foo x = Foo(args) in old C++)