c2lang / c2compiler

the c2 programming language
c2lang.org
Apache License 2.0
702 stars 47 forks source link

C2C with used parts of Clang copied #55

Closed lerno closed 5 years ago

lerno commented 5 years ago

This is an alternative pull request that has all changes pulled into a single commit. It might be preferable. I'll base all my fixes off this one.

bvdberg commented 5 years ago

This is a HUGE patch. I'm looking at it today, but i'm definitely ot a fan of these big all-or-nothing approaches.

bvdberg commented 5 years ago

I'll review in steps since it's too much for a single 'pass'. My observations so far:

Some questions that came up:

In a later stage, what I would like is to get rid of as many .def files as possible, especially for the Lexer and some things like BINOP_ (since tools can't find the definitions of these easily). Also replace the Clang/ code with something even more slimmed down to the basics of what C2C needs.

My current feeling is that this patch is a nice enabler for easier changes in the future. I'll do some more tests, but I think I'll merge it. Since this will open the path to other (big) changes in the Clang/ base, I do want to state that discussing the exact changes upfront in the future is my preference. I don't want to disappoint you by having to reject a lot of work.

bvdberg commented 5 years ago

I've recompiled LLVM as standalone and used that. This required some more fixes in cmake files. Also when I try to compile the examples/ (in main archive) the inctest fails that #include is not supported anymore. For this patch, the behavior should be the same as the version that depended on Clang.

lerno commented 5 years ago

Yes. Was #include needed? I have concentrated on removing everything superfluous. That included c-style includes.

There are two options:

  1. Pick changes from the other branch (”no_clang”) up to - but not including - the include removal.
  2. Rewrite the example to work with pure C2.

Sent from my iPhone

6 nov. 2018 kl. 11:18 skrev Bas van den Berg notifications@github.com:

I've recompiled LLVM as standalone and used that. This required some more fixes in cmake files. Also when I try to compile the examples/ (in main archive) the inctest fails that #include is not supported anymore...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

lerno commented 5 years ago

Aside from stripping code, I have not modified the clang in any manner - except for the BO -> BINOP modification which was a bad judgement call). Look at the one with separate commits. The first is just getting everything copied, you can then look at the changes to the clang code step by step and verify that I’ve only stripped code from it.

Note: In some cases this meant simplifying boolean expressions as compile options were removed e.g. ”if (!langOpts.cplusplus && foo)” becomes ”if (foo)”

Sent from my iPhone

6 nov. 2018 kl. 10:10 skrev Bas van den Berg notifications@github.com:

I'll review in steps since it's too much for a single 'pass'. My observations so far:

resulting stripped binary size went down from 8.1 -> 5.6 Mb, nice. compile time did not go up by much you corrected some llvm includes "" -> <> some warnings about defaults in switches during compilation patch add (in Clang/) some 42K loc. That is a lot. there are some renames (BO -> BINOP). Better to split these into separate patches, since it's makes checking the changes harder. But leave them for now (since i do agree on the name) in BuildOptions (C2Builder.h), you added unused variables: maxIndirectionsToFollow and maxBracketNestingDepth the tok::code_completion has been removed. I never really checked how that worked, do you know? In TargetInfo.h you added the (yet) unused intWidth. Some questions that came up:

Did you change the copied clang code? (other than changing include paths, stripping some stuff and renames (BO -> BINOP) ? How do you foresee the next steps? Since this step is only an intermediate right? The patch doesn't seem to add any new features, except enable changes in the future, right? In a later stage, what I would like is to get rid of as many .def files as possible, especially for the Lexer and some things like BINOP_ (since tools can't find the definitions of these easily). Also replace the Clang/ code with something even more slimmed down to the basics of what C2C needs.

My current feeling is that this patch is a nice enabler for easier changes in the future. I'll do some more tests, but I think I'll merge it. Since this will open the path to other (big) changes in the Clang/ base, I do want to state that discussing the exact changes upfront in the future is my preference. I don't want to disappoint you by having to reject a lot of work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

lerno commented 5 years ago

Let me explain how I did this change:

  1. Copy all files in various libraries of Clang to /Clang. So clang/AST or clang/Basic, all would end up in the same directory. /Targets was an exception since I wanted to remove some of those.
  2. Change all c2c source paths to "Clang/..." instead of <clang/.../...>
  3. Change the namespace of the copied files from clang to c2lang (so that building would not accidentally use the clang code.
  4. Add cpp files to CMakeList.txt
  5. Work, work, work until it compiled.
  6. Remove clearly unused files that weren't needed to actually compile.
  7. Iterate by: removing unused settings / functions, remove associated code (and fix booleans!). Check that it compiles.

No additional keywords or behaviour was added. Only things removed. BO -> BINOP being the only exception that I can remember. I have not added any extra lexing, BUT I have removed a lot of keywords. That enabled further stripping of the code.

lerno commented 5 years ago

The BuildOptions where options from Clang that I wanted to keep a discussion about. They were not used in the lexer, so they could be removed, but it's something we're likely to want to have in the parser. So I wrote those booleans there – finding no dedicated build option struct to put them yet. I just didn't want to remove the "knowledge" of their need from the code. After all, there is a lot of experience embedded in the Clang code!

lerno commented 5 years ago

The tok::code_completion was something I stripped in the last few checkins. The token behaved as sort of a wild card to enable clang to reason about what it should be in the parser is my impression. It did seem to turn of checking when encountered in the lexer, but there was very little in the lexer with the token, it was probably for the parser to later reason about and send feedback.

I'm guessing that when used as a language service, Clang puts the placeholder there and runs the parsing. When the parser reaches the placeholder it can report what it expects (from a grammar viewpoint) what should come next.

I found very little to salvage consequently. C2 might re-introduce it, but then we know exactly how to hook it up.

lerno commented 5 years ago

The TargetInfo intWidth was a mistake and can be removed.

lerno commented 5 years ago

My idea here is that since we pretty much stripped the lexer to its barebone features, we can see exactly what functionality we DO need for later.

In the meantime this enables us to easily add / edit tokens and keywords. That + the dependency on a custom Clang (offputting for people wanting to contribute) were the primary drivers of this change.

Since we still need the Clang macro expansion, I was not going to touch this until we implement "C2"-macros. Getting rid of preprocessing is nice, but not something urgent. Making it easy to modify keywords and tweak lexing on the other hand is very important. Adding a keyword before meant updating two different code bases.

Also, this removes a lot of implicit C code and parsing. I've kept the most of the features of the literal parsing, only changing so that the delimiter (used by C++ for delimiters in literals) becomes instead of '. "" is ambiguous in C++, but not in C or C2, and much more readable.

So right now that's pretty much all that I was going to do on the lexer for quite a while – aside from any changes needed if we decide to prefix compile time keywords with "@". In that case I think the lexer should consider @sizeof a keyword but not @ sizeof. That means the parsing of "@" (or whatever character we use) needs to be different.

bvdberg commented 5 years ago

I was trying to write a unit-test framework like ctest in C2. But that still needs macros, because there is no alternative yet. So #include will be removed at some point, but only after we have an alternative...

lerno commented 5 years ago

The warning for default branches is because I removed unused tokens, so suddenly those switches handled all the cases... I didn't want to jump in an do that change immediately though since the changes were big enough anyway.

bvdberg commented 5 years ago

If you could re-add the pre-processor tokens, then I will merge the branch.

lerno commented 5 years ago

The best is to use #57 which should contain all changes up before the include was removed, but please test it first on your examples. I haven't done so, just removed the later commits.

lerno commented 5 years ago

@bvdberg I agree on monster patches like this one. But I couldn't think of a safer way to remove the dependency. Pulling all code from Clang into c2c was always going to add a lot of code (about 100k LOC originally, it's down to 30k now?)

If you look at #46 it's easier: you can see the original addition of clang (here's there's first the addition of clang (plus the change of paths and namespace needed). The following commits will mostly only delete code, some pieces at a time.

bvdberg commented 5 years ago

closed for now, until c2 macro system..