dobkeratops / compiler

C/C++ subset resyntaxed like Rust,+ tagged-union/Pattern-Matching, UFCS,inference; LLVM . example:
https://github.com/dobkeratops/compiler/blob/master/example.rs
GNU General Public License v3.0
71 stars 7 forks source link

Loads of things (information overload ;) #1

Open ozra opened 9 years ago

ozra commented 9 years ago

Your ideas about the language awoke my curiousity about some ideas:

As you know, I'm intrigued by the SugarCpp project, syntactically. I've been in the web/nodejs business for a couple of years now, coding LiveScript. I've also attended a bunch of meetups with one of the founders of Erlang, one of the first external (outside of Ericsson LM AB) Erlang users, now a devote Haskell coder, where a lot of focus has been on Haskell. And, I'm since a few months back in to C++-coding. It's basically like writing pseudo code, changing a few characters here and there, and you're good to go.. Well, that's some background, which might hint to my perspectives.

I really like your ideas about two way type inference - not having to template manually for trivial cases, UFCS - perhaps 'both ways' (using a method as a function), matching capabilites, the for..else is such a clever idea - so natural for so many (imperative) cases.

Significant ws + voluntary curlies

When it comes to syntax, I believe keeping the mind open to, as scientifically as possible, identify the absolutely clearest most readable syntax one can come up with for the code that is most commonly written to get the most stable results.

Now, this is all ideas and wishful thinking, in any event, I'm really happy to see your project! New languages are always cool! :-)

It saddens me that I don't have the time required to implement a language from scratch in my current life situation, but I'd be happy to collaborate and contribute on one, and I'd be happy to start using one in practical projects soon.

ozra commented 9 years ago

Oh, and yes, Haskell has the notion of "layouts", you can code it in either significant whitespace or curlies!

dobkeratops commented 9 years ago

but I'd be happy to collaborate and contribute on one,

first and foremost I'd welcome any collaborators, I realise 1 person making a language for 1 user is highly uneconomical .. (this is why I've gone with this hybrid 'rust' idea hoping I can benefit from rust tooling , even just their syntax highlight appearing in editors)

I always bounce because I'm reluctant to commit to C++'s header complexity, traditionally I always preferred rolling a simplified vector too vs std:: but I've gone with that now.

Well, that's some background, which might hint to my perspectives.

My own background is game development for which C++ rules. I know relatively less about web languages etc, but have always been fascinated by things like Lisp,Haskell, Erlang. Its' a shame C++ took so long to get lambdas. This is basically "the language I wish I had..". Language choices without GC are so limited. Rust focusses too much on safety, although I can see its probably superior for most users. Gamedevelopment needs performance & productivity.

Runtime data-structures are simple,reworked for minimal allocation/pointerchasing, conditioned by offline tools. Tools don't need to be so tuned, but it is useful to move code back & forth... I've tried to explain this to the Rust community and they don't really get it... how one can have a need for C/C++ like performance AND productivity in the same package , moving between different extremes in different places. For performance oriented code, I still sometimes like the purity of raw pointers over iterators. Most of my own programming time is spent debugging by writing visualisations of intermediate states.. there's a lot of code that doesn't need to be fast but does need to be in the same place.

Would you consider significant whitespace? / "newline indented to same level as expression start line" = expression termination

YES. At the very least I want to get the logic in there to give errors when braces & indentation don't match - after all the time one spends 'hunting the missing brace..'. Then it'll be a simple step for "brace insertion". I think the same parser engine should be able to handle both. For the minute, my brain freaks out a little without {..} but I conceptually prefer significant whitespce. I might start calling braces and template parameters abstract tokens e.g. BLOCK_BEGIN/END and TYPARAM_BEGIN/END for switching on these. (there's similar template ambiguity to consider and I've already gone down the Scala [T] route, i'll need to make that switchable for rust-user-appeal)

one thing though I've come to realise is { } ; and even C++'s extraneous if () all help to disambiguate from lt/gt, but the hack i;ve tried there also declares type-params must be on the same line..

I have other priorities right now - but I am thinking about it.

or always return last, and specifically write void in the end instead of ;.

I'll make sure I parse a return keyword - and basically my parser basically 'inserts void', thats in the AST dump... writing 'void' might work accidentally already. Perhaps in "significant whitespace mode" a mandatory "return" would be a great compromise.

C++11 as assembly language

I'm in the middle on this - I definitely want a C backend; I'll say I like the simplicity of C vs C++ as a target too, but I can see from SugarCpp that you can leverage "the full power of C++11" and not have to replicate everything. I agree that C or C++ is a nice universal base.. I personally hope the C language lives on - replace C++ with alternative OOP & metaprogramming schemes. Or perhaps a C++ compiler could get an alternative modernised fronted to the same AST & 'middle'.

when users know that it "bascially is C++", but better, they'll easier adopt it, because they can back out, by simply generating C++ and continue in that,

I agree that a compile to Readable C++ , just like SugarCpp , would be a great thing to have - inspiring confidence for people (like myself) who are reluctant to try anything not C++.

I already worked through some complexity in generating LLVM 'phi nodes' (half broken so I've shelved it) and I've tried to abstract the way the code generator passes values around , a 'CgValue' that can be a register, a literal/variable, or a reference- I'm adding slightly higher level entities to the 'CodeGen' interface i.e. "c-like for" is the looping "primitive" (make anything else by omitting parts) so it should be possible to slot in an alternate back end; i've got both 'get element(index) and 'getelement(name)' abstractions in there. there's still some residual mess where my AST node "::compile()" methods spit out LLVM text, its very fidly and error prone but i'm getting there cleaning it up.

r.e. universality of C vs LLVM - From my background in game development (similar to embedded) i've experienced many platforms with custom from early days and of course have seen how C was their ubiquitous choice. For example when the xbox 360 first came out, they initially made their SIMD extensions accessible through C intrinsics, but not C++ classes initially. Of course in 2014 people take a C++ compiler for granted.

dobkeratops commented 9 years ago

compatibility with existing high performant libraries in the de facto fastest system language (C++) This way you can use headers directly, you can benefit from link time optimizations, and even

my name mangle might be buggy but right now if you 'extern' it's just generating a C++ symbol so hopefully it can link to C++ overloaded functions. Translating headers etc is more complex ,but its' on my roadmap: I want to self-host by translating my own source. I'm experimenting with a simplified "<T>" parser hack.

I can't really ditch C++ - I'm still way faster at writing C++ than any other language - years of familiarity and mature tools.

I'm not sure when I'll go through with all this: I'm hoping someone in the Rust community can, in parallel, write a raw C++ to Rust transpiler, then I can merely ask them to relax their rules and it'll give me the transpiler for my 'hybrid' language for free. (there are already various binding generators). I can also try to write in a subset of C++ that does translate to Rust.

I don't have "clang" inside this project yet;

I don't think i'll get much of this done soon - I have more work on basics to do e.g. no module system yet, and also to get interest from the Rust community I think I need a "match/enum" implementation (despite the fact what I have now is suitable for my needs.. the improved templates can do what boost variant tries too, more elegantly)

compatibility with existing high performant libraries in the de facto fastest system language (C++) This way you can use headers directly, you can benefit from link time optimizations, and even

Right this is another reason generally why I'm not sticking with Rust: you can't represent interfaces to existing C++ libraries 1:1, since they don't have arbitrary overloading. They declare it a misfeature - I can see their logic but I'm firmly in the camp that sees a "special parameter" as more of a misfeature (because it forces upfront choices and creates hell when you want to refactor). Rust basically only selects functions based on the first parameter (even for compile-time polymorphism) similar to Go although they've recently added 'multi parameter traits', it still requires boilerplate. They also severely restrict where 'methods' can be added ("your type or your library") - you can't add helper methods to library types without deriving another interface first.

I've got 'methods' in this language for compatibility but if C++ didn't exist I'd just have freefunctoins, UFCS, arbitrary overload,(& optional gather into vtable) and be done with it.

Having said that I would like to introduce a special "partial name mangle" e.g. struct Foo { fn "C" bar() } // will compile extern "C" Foo_bar(Foo*);

and I'll suggest that to the Rust community, they can only link with C functions directly and I think that would make extern "C" libraries more pleasant

dobkeratops commented 9 years ago

"and even compile time cross unit optimizations based on inlining and templating from a huge existing base."

i'll have to check up on details but I think the LLVM infrastructure does support whole program inlining. C++ Templates via C++ of course require C++ , so you're right this is where compiling to C++ would be a win. Whilst I'm aiming for a similar feature set to C++ templates I'm sure I'll have some omissions and subtle differences that defeat exact translation of existing C++ source.

ozra commented 9 years ago

i'll have to check up on details but I think the LLVM infrastructure does support whole program inlining. A yes, it should, however I've recompiled the whole LLVM suit over and over different ways, downloaded patches, spent hours and finally gave up. So I haven't experienced it in practise unfortunately. But still, as far as I've understood it should be in place. I compile with clang/LLVM during dev, and then gcc for release atm.