Champii / Rock

Native language with Rust & LLVM
GNU General Public License v3.0
66 stars 3 forks source link

[ENHANCE] Print and Println distinction #122

Closed oraqlle closed 2 years ago

oraqlle commented 2 years ago

It might be good to have a .print! method and a .println! method for the show trait with .print! just printing the text and .println! printing with a newline, just for more customizability.

Champii commented 2 years ago

I am totally in for that ! However, I really like the semantic of doing .print! and having a newline, maybe we should reflect on a proper name for a function that don't have one ? :)

Champii commented 2 years ago

I was thinking about .print_! for that, it reminds me of the cursor that stays at the end of the line in a shell, but it feels a little strange

oraqlle commented 2 years ago

Maybe a whole different name under the printer trait, like .display! or .show! which just takes a raw string and maybe even a end-of-string character like pythons print(str, endline='\n') and displays the string as is.

Champii commented 2 years ago

The default argument is a nice idea ! I was wondering it they (the default arguments) were worth implementing because they add a lot of work on the method resolution mechanism, particularly when in conjunction with trait resolution. They also would conflict with an hypothetical currying implementation for methods so there is something that is worth discussing.

We could define a method like .print_eol str, '\n' that would be called from the print!. It could be used to print without the newline if not wanted: .print_eol str, '\0'. This could be a temporary solution until we find something better ? (that makes me think, I have not implemented the \0 escaped char)

Or as you said, just a dedicated method .display! sounds nice, but I'm afraid that the naming between print, display and show would be a little confusing.

oraqlle commented 2 years ago

Maybe the ability to set default arguments isn't worth it if you want Currying, however, rather then default methods, you could have a method as part of the show trait called .putstr!. This just takes a string an prints it, just a raw string print, but you could redefine .print! in terms of .putstr! and have it append a newline to the end of the string. That way, .putstr! is the general method but .print! is guaranteed to have a newline at the end. This also opens the opportunity for for a format trait that can do all the replacements and stuff and it just returns a string which can be chained by .print! or .putstr!.

eg.

"{1}, {0}".format!(1, 2).print!

# Outputs: 2, 1

Here, the the string is the format string, it can take arguments to replace and then returns the formatted string which print will just display with newline.

Champii commented 2 years ago

Oh my god I love this ! This is simple, elegant and clear. With the current syntax you could write it that way:

"{1}, {0}".format 1, 2 .print!

Also, I think that the Show trait and its show method should be renamed to follow the rust standard to trait ToString that would have a .to_string! method.

oraqlle commented 2 years ago

I'm glad you like it. And not necassarily, I'm assuming that Show gets its name from Haskell, show could be the same as Rust's Display trait. It could define the print, show and putstr methods along with others. The ToString is more general than Show as structs and other forms of data can have all sorts of different implementations of .to_string! depending on how the user wants it to work. Show should just describe what happens if they use .print! or .putstr! on the data directly, if the implementation just calls the data's .to_string! that up to the user.

TL;DR The trait Show, Print, Output or Display (whatever you settle on) should describe what it means to be a printable/displayable type ie. use the associated methods. ToString just turns something into a string. Format should describe data that can be formatted, thus also implements some derivation of ToString (I think).

Also, on another note I wonder if it makes more sense for a .format! method if it were to exist, would look better as:

1, 2 .format! "{1}, {0}" .print!

That way, if you implement different arguments say a struct, the .format! method takes the struct by the dot operator and takes an additional string (always a string) which sets the format parameters.

Champii commented 2 years ago

You're assuming right about the Show trait name. I liked the distinction Rust made between the Display and Debug trait, that both produce strings but with different purpose. I think this is a good feature and this may be what we want to implement for rock. Show can be seen as the actual Display so it would not be hard to just rename it to make more sense. I think however that we should keep the distinction between Display and Print, as one produce a string from anything, and the second take that string to actually display it. (we can see here that display collides with the idea of what we are actually doing, we might have to chose a better name). The Print trait will be completely generic and automatically generated for any type that implement Show (or whatever name we chose), so its implementation details should never bother the end user. This trait dependency is yet to be implemented, as well as the generic implementation but you get the idea :)

I'm totally agreeing with you on the Show, ToString and Format point.

I'm not quite sure about this format! call-style, it seems not clear and having the arguments being the self of this expression seems weird (imho) We could embrace a completely different style and have some sort of string interpolation shorthand like in JS

`${1}, ${2}`.print!
# or with variables
`${foo}, ${bar}`.print!

Maybe I'm going a little too far with this I don't know, it's late here ^^

Champii commented 2 years ago

I just want to add that this shorthand syntax smells heavily like a macro feature... I thought that beast would be a little further away but here we are :)

oraqlle commented 2 years ago

The string interpolation feature sounds like a much better solution, keep everything in a single string. But it would make the most sense if it could take anything that implements the ToString trait or if you want it to be more constrained a Display, type or even further if its a Print type; that way you could just throw in a struct if it supports it and it print. I think it would look best if it used a raw string, like:


# ...

"${1}. ${foo.bar}".print!
# >> 1. hi

As with macros, a Rust style macro system would be awesome but we would need to make the traits and type systems more rigorous as metaprogramming expansion is super difficult to get right (ie. I thinks its hard and I have no idea where to start with a that sort of pre-processing) but it is well worth the work if we can get it right.

Champii commented 2 years ago

I, too, find this string interpolation solution quite elegant and concise. I wonder how we could implement such a feature. Under the hood I think this will just be some sugar for a call to a Format trait like in rust.

`${foo}, ${bar}`.print!

This could desugar to:

"${0}, ${1}".format foo, bar .print!

I think it is important to introduce a new syntactic construct here with the back-quote '`', to also allow users to construct raw strings that would not be interpolated.

"${foo}, ${bar}".print!

This would just print the string as is, without interpolation.

You are right about the trait and type system, their overall rework is long overdue to become robust enough to handle such a feature. I'm starting to become blocked from all parts because of its weaknesses, notably the issue #118 has a high priority because we can't construct any struct or function that handles other integers than Int64. This is blocking for so many features like the libc struct interoperability (I'm starting to implement more complex features like Socket into the stdlib)

I've never implemented a complete macro system for a language, and to have one as powerful as Rust's will be quite an adventure on its own ! All along the development of Rock, I've taken a lot of inspiration from the Rustc's source code, and I think I will continue to borrow some of their concepts for the initial draft of that feature. The question is still open to discuss if this string interpolation thing would be better if implemented in the form of a macro.

oraqlle commented 2 years ago

Sorry fir the radio silence, I’ve had poor internet for the last few days.

True, I didn’t think of the consequences of using a string literal with the .print! method so introduction of the back-tick string literal as sugar for the ”{}”..format! _ .print! pattern would be a great idea.

Having support for other POD types is important and I can imagine it is blocking a lot of progress.

A macro system is complex but I was giving it some thought. It would mostly require a sophisticated pattern matching system such that expansion into real rock code was just of process of text replacement. A way to write code that writes more code for you. To start, we could take the approach of the C macro system for simple text replacement in the pre-processes stage of the code, or if you want it to be more sophisticated even the C++ template and concept system is Turing complete can offer a model to build a prototype system.

As for the string interpolation being performed by a macro, if you wanted to take this approach you might want to consider using macros to perform a whole bunch of IO operations much like how Rust does. I will look at Rusts macro pre-processor for macros and there implementation of certain macro themselves when I get back, to get idea of who the system works and how its expected to work.

Champii commented 2 years ago

Sorry fir the radio silence, I’ve had poor internet for the last few days.

Don't worry, this is voluntary open source work and we all have a life outside of programming (or have we ? :p)

Having support for other POD types is important and I can imagine it is blocking a lot of progress.

Yes it is ! I think there is just the matter of #120 and #118 (this one is no joke) and maybe #124 to have something that begins to work.

A macro system is complex but I was giving it some thought. It would mostly require a sophisticated pattern matching system such that expansion into real rock code was just of process of text replacement. A way to write code that writes more code for you. To start, we could take the approach of the C macro system for simple text replacement in the pre-processes stage of the code, or if you want it to be more sophisticated even the C++ template and concept system is Turing complete can offer a model to build a prototype system.

Indeed, this is a feature that has a lot of work related to it. You are suggesting a pattern matching system, and that's exactly what I'm having in mind. Ideally, this system would work very similarly to the rust's macro_rules system. The user would have a way to define a sequence of repeating tokens interlaced with bits of Rock parsing ability for basic expressions and types.

macro my_macro
  $e:expr =>
    $e.print!

main: ->
  my_macro "hello_world"

Would expand to

main: ->
  "hello_world".print!

This is indeed very experimental syntax, I'm quite concern about the parser limitation we will encounter with edge cases. There is also the subject of recursive macro, ie. macros that generate more macro, or auto-expanding recursive macros (something like this: https://blog.cloudflare.com/writing-complex-macros-in-rust-reverse-polish-notation/). I think we will have to start small, but I also think this can introduce new long awaited features like pattern matching (I can't wait to work on that one)

As for the string interpolation being performed by a macro, if you wanted to take this approach you might want to consider using macros to perform a whole bunch of IO operations much like how Rust does.

Yes I was wondering if we could do some magic like Rust does. For example, I was thinking about implementing va-args in term of macros (like rust), so that it don't pollute the global space of features (Implementing that in the compiler with such a generic language would be a nightmare).

I will look at Rusts macro pre-processor for macros and there implementation of certain macro themselves when I get back, to get idea of who the system works and how its expected to work.

Awesome !

oraqlle commented 2 years ago

Don't worry, this is voluntary open source work and we all have a life outside of programming (or have we ? :p) Haha, sometimes that's not the case.

I'm not sure how to implement varying sized POD types as I come from C++ and it is highly dependent on the system architecture to how much memory each would take up but amusing this might be handled by LLVM using some toolchain so you can the benefits of different sized integers (and other various types).

As for a macro system, the best place to start I think would be simple expansions like the one you got above. Such that it replaces text and might be able to take parameters and still (hopefully) maintain the type and trait constraints. The best way I think to go about this is to build a pre-processor that parses everything and then the compiler steps in and validates the code. But it would be useful if macros could be marked to the compiler such that if an error occurred in the expansion, or in the resulting code it could be traced properly to what the dev sees.

Variable arguments are interesting as there are various ways to go about shaping them. I'm not sure how Rust does it but because C++ has compiler directives (the __VA_ARGS__ macro) that it wouldn't be far fetched to assume there's some tool in LLVM that can help with expansion, indexing, extraction and type indexing.

A pattern matching system would also be great. I'm curious to see how you want to make it look syntactically or semantically. And on that not, what do you have in the works for enums and other algebraic data types?

Champii commented 2 years ago

I'm not sure how to implement varying sized POD types as I come from C++ and it is highly dependent on the system architecture to how much memory each would take up but amusing this might be handled by LLVM using some toolchain so you can the benefits of different sized integers (and other various types).

This is actually LLVM that is handling how much memory a type must take on the given architecture. All we have to do in Rock is to have a map from Rock builtin types to LLVM types. This is what we have in CodegenContext::lower_type(), and as you can see the type bestiary is missing some very basic members.

As for a macro system, the best place to start I think would be simple expansions like the one you got above. Such that it replaces text and might be able to take parameters and still (hopefully) maintain the type and trait constraints. The best way I think to go about this is to build a pre-processor that parses everything and then the compiler steps in and validates the code.

This is an interesting take. The pre-processor you are mentioning could simply be the parser itself, imo. We could describe macro constructs as AST element that can produce any TopLevel or Statement or Expression based on some arbitrary syntax. That arbitrary syntax defined in the 'head' of the macro, has its own syntax to be parsed from, so that we can combine a new nom parser at compile time to actually parse it.

macro my_macro
  $e:expr => # this part here is the head, and its syntax can be parsed. This parsed syntax would serve to 
             # construct a new parser that will be applied to the macro call to parse its content.
    $e.print!
main: ->
  my_macro "hello_world" # the new parser would parse "hello_world" here 

The expansion process would have to take place right after parsing, and before the resolve phase, so the rest of Rock would hopefully not be affected by this wizardry.

But it would be useful if macros could be marked to the compiler such that if an error occurred in the expansion, or in the resulting code it could be traced properly to what the dev sees.

This is the role of the Span data structure, that tracks where each token occurs in the source file. This is almost trivial to make the expanded ast items to point to the original Span that generated them.

Variable arguments are interesting as there are various ways to go about shaping them. I'm not sure how Rust does it but because C++ has compiler directives (the __VA_ARGS__ macro) that it wouldn't be far fetched to assume there's some tool in LLVM that can help with expansion, indexing, extraction and type indexing.

I don't think Rust implements them, but I might be wrong about that. The only occurrence of a somewhat related features is in the macros like println!() that can take an arbitrary number of arguments. Those are implemented in term of recursive macro expansion, so that the final generated code always have a fixed number of arguments. Once again I might be wrong here, as I'm not an expert.

A pattern matching system would also be great. I'm curious to see how you want to make it look syntactically or semantically. And on that not, what do you have in the works for enums and other algebraic data types?

I'm also curious about that. I think I would want something like Livescript has implemented, but as to what it would look like I strictly don't know yet ! :) I am indeed waiting to implement rust-like Enums before starting to think about pattern matching, as enums are strongly dependent of that feature as well as its first implementer. I have basically zero work about that yet, I'm wondering If I should cleanup what we have already before starting this huge pile of work.

What other ADT you had in mind ?

Champii commented 2 years ago

I'm marking this issue as closed/resolved, but feel free to continue discussing here or in a new issue/discussion :)

oraqlle commented 2 years ago

Fair enough, I figured it was something in LLVM helping to map Rock types to actual data.

The macro syntax you've got looks really good, simple enough to fit the languages existing design but general enough to grow and make cool stuff with. And with regards to the pre-processor/parse handling macros, it would be good (I think, but bear in mind, never designed a compiler or language before) if it processed macros before anything else, that why types can be checked and guarantees met when the whole thing is then parsed.

I think the rust approach with variable arguments using recursive macro construction is better, but again, we need a macro system to exist 😅. It would also be better to have variable arguments (at least at first) only work with macro (like .format!). I was also thinking, maybe a special enclosing can be used for variable argument macros to help with parsing. As an idea:

"${1}, ${0}".format![foo, bar].print!

# bar, foo

Similar to Rust's vec![] macro. I do now that in Rust's case, square brackets are used cause its an array type but parenthesis and braces seemed uglier. Food for thought.

it's cool you've got a structure to handle a Token tagging system, that's good. I'll have to have a look at that.

I haven't seen Livescript's pattern matching and I couldn't find anything. how does it work? And I wouldn't start enums yet if you've got lots to do with the language already. Enums require a unique memory footprint, similar to a struct just without named types. Are Rock's structs like Rust's and C++'s or more abstract.

As for other ADT, option and variant are good higher level versions of enums and unions. A hashmap would be good as well as deques but do you want them as part of the stdlib or language (ie. Python approach or C++/Rust approach)? I think ADT will come with inspiration, need and desire to have them.

As for contributions (I know you didn't ask but I do want to help), I've just finished a project I've been working on and did a whole heap of restructuring. Right now I finishing off a package repo for C++ code using a very cool tool called DDS. It's a package manager and build tool very much like Cargo for C++ (finally, only took 40+ years). My hope is that maybe with people using it and a repo for people to submit packages (and not have to use CMake anymore) it will get more popular and become the de-facto build tool cause I think its awesome. The reason I bring this up is its been on my mind for months and I really want to get some stuff in place so people can see it's useful and even an utter noob like me can make it work. Anyway, I want to finish up the very primitive website I have for the packages so that it exists and I can kinda put that all in cruise control. So I will be working on that. Not hard stuff, just design and crap. Once I've finished that (shouldn't take too long to design), I would like to take you up on the offer to run through this language and help work on it. I think it's a very cool language and I would very much enjoy being able to help and be a part of a bigger project and I've been interested in language design for a little while. So, I will continue to be try to be as collaborative as possible and be someone to bounce ideas off and jump on the project soon.

Champii commented 2 years ago

The macro syntax you've got looks really good, simple enough to fit the languages existing design but general enough to grow and make cool stuff with. And with regards to the pre-processor/parse handling macros, it would be good (I think, but bear in mind, never designed a compiler or language before) if it processed macros before anything else, that why types can be checked and guarantees met when the whole thing is then parsed.

I don't think we can have an ordered parsing and pre-process only the macros before parsing the rest.
The compiler main internal pipeline is as follows: (some minor phases are left out for conciseness)

Parse AST -> Resolve(1) -> Lower AST to HIR -> Type constraints and Resolve(2) -> Monomorphization -> Lower to LLVM IR -> Compile to binary

The Resolve(1) is a name resolving phase that is applied to all basic identifiers, but not for the dot notation, that comes in the Resolve(2) phase when we know more about all the types and their implementations.
As you can see, we are not dealing with types when working with the AST in the first part of the compiler, that serves only for basic analysis such as name resolving. All the type processing is done later on the HIR, that is a more suitable structure for a deeper analysis.

With that in mind, we can easily consider the AST as just a raw representation of the source code that don't speculate on the semantics but focus more on structure and syntax, and macros would just be one item in that AST, like everything else at this stage.

I was imagining something like

pub struct MacroDecl {
    pub name: String,                                               // macro name
    pub args: Vec<MacroItem>,                               //   $e:expr =>
    pub body: TopLevelOrStatementOrExpression, //     $e.print!
}

This is very fresh out of my mind, just to illustrate. I doubt I'll use such an ugly declaration. (lol) You can find more about Rock's AST in its file: src/lib/ast/tree.rs

When we can parse macros in this way, it is just the matter of adding an expansion phase between the Parse and Resolve(1) phase, that would produce a full AST ready to go to name resolving.

I hope I understood well what you were describing, feel free to elaborate if not :)

I think the rust approach with variable arguments using recursive macro construction is better, but again, we need a macro system to exist sweat_smile. It would also be better to have variable arguments (at least at first) only work with macro (like .format!). I was also thinking, maybe a special enclosing can be used for variable argument macros to help with parsing. As an idea:

"${1}, ${0}".format![foo, bar].print!

# bar, foo

Similar to Rust's vec![] macro. I do now that in Rust's case, square brackets are used cause its an array type but parenthesis and braces seemed uglier. Food for thought.

You are totally right about the enclosing of arguments, I didn't thought about that. We may have to enforce () or [] around them to make sure the parser don't get mixed between macro and real code.
We might need also a special character to differentiate macro calls from classical calls, like the ! in rust prepend to the argument list. I see you used it like in rust in your snippet, but I think this would be misleading to reuse the ! character like in the no-arg-call sugar. We might have to invent a new syntax here ^^

I haven't seen Livescript's pattern matching and I couldn't find anything. how does it work?

I'm sorry I read pattern matching but I thought destructuring. While I think both are pretty close features, I think we are better off with the classical match keyword for starters, like in rust. We will implement more deep destructuring on that later.

And I wouldn't start enums yet if you've got lots to do with the language already. Enums require a unique memory footprint, similar to a struct just without named types. Are Rock's structs like Rust's and C++'s or more abstract.

Indeed they are unique ! They require some special magic to be usable and pattern-matchable, unlike structs in Rock, that are plain ol' C struct

As for other ADT, option and variant are good higher level versions of enums and unions. A hashmap would be good as well as deques but do you want them as part of the stdlib or language (ie. Python approach or C++/Rust approach)? I think ADT will come with inspiration, need and desire to have them.

I think this will indeed be part of the stdlib, and implemented in term of the language. And you are right, I think it is better to implement them on a need-basis, so that we have real use cases for them.

As for contributions (I know you didn't ask but I do want to help), I've just finished a project I've been working on and did a whole heap of restructuring. Right now I finishing off a package repo for C++ code using a very cool tool called DDS. It's a package manager and build tool very much like Cargo for C++ (finally, only took 40+ years). My hope is that maybe with people using it and a repo for people to submit packages (and not have to use CMake anymore) it will get more popular and become the de-facto build tool cause I think its awesome. The reason I bring this up is its been on my mind for months and I really want to get some stuff in place so people can see it's useful and even an utter noob like me can make it work. Anyway, I want to finish up the very primitive website I have for the packages so that it exists and I can kinda put that all in cruise control. So I will be working on that. Not hard stuff, just design and crap. Once I've finished that (shouldn't take too long to design),

Oh this seems like a big deal for C++ ! I had to deal with the nightmare of C++ project building in the past, and I can only salute such efforts :) Don't worry, take your time ! I've a lot on my plate too with the type/trait rework, so that you can join me on a more sane codebase.

I would like to take you up on the offer to run through this language and help work on it.

The offers still stands ! Any time you want, that would be my pleasure :)

I think it's a very cool language and I would very much enjoy being able to help and be a part of a bigger project and I've been interested in language design for a little while. So, I will continue to be try to be as collaborative as possible and be someone to bounce ideas off and jump on the project soon.

Your contributions are still invaluable to me (and to Rock), so please do :)

oraqlle commented 2 years ago

Okay, I see how you've designed the build process. In that case, the use of a Rust struct to pack the data like you've down seems as a good starting place, especially with a Vec of args. The biggest challenge would being making it generic enough to be usable from a users point of view, not just resolve a specific task.


As for the argument capture of the macros, I was thinking we could use the $ would be a good use as it at least it would be a good symmetry with its use in .format!. It might look something like:

"${1}, ${0}".format$[foo, bar].print!

# bar, foo

But its a bit dense so another option I saw was using the tilde (~), so that it was a bit better spaced.

"${1}, ${0}".format~[foo, bar].print!

# bar, foo

A third option I thought of was remove the brackets and maybe use markers eg. $, #, % or &.

"${1}, ${0}".format $foo, $bar .print!

# bar, foo
"${1}, ${0}".format #foo, #bar .print!

# bar, foo
"${1}, ${0}".format %foo, %bar .print!

# bar, foo
"${1}, ${0}".format &foo, &bar .print!

# bar, foo

The parser could then treat the whole macro as a single expression. I particularly like the ampersand option as its kinda reflective of a reference capture to a variable or functions return value.


As another thought, for different data structures, I was thinking that using symbols as literals for different data structures would be cool. #{} for hashmap, %[] for vector, maybe %() for tuples, %{} for a set. Just food for thought.


Yes, I believe bpt (was called dds, got changed today to bpt) is a very big deal for C++. It just got it's first beta release which is refined a lot of stuff to do with package metadata. So, re-jigging everything I've built to the new structure, blog post soon hopefully to promote it and then a sick new language to help make.

Also, do you have a Slack channel (not sure how slack works so ignore my ignorance) or something to continue this discussion and more to continue the progression of ideas for the language. Might make it a bit easier.

Champii commented 2 years ago

You make a good point, we need a place to chat about stuff like that. I've just setup a Discord server, let's continue this conversation there :)