Decoupling of the front/middle/backend

chuggafan commented 5 years ago

So, essentially what I'm asking for here is to change OCC's internals dramatically, and I know this is a lot of work, however I think the benefits outweigh the negatives of re-writing here.

So essentially what I'm asking for is to turn the frontend of OCC into a separate program from the middle/backend, sort of like how CLANG outputs LLVM IR and then is compiled, or GCC outputs GIMPLE that gets turned into GENERIC then into assembly, etc. The main goal of this is to be able to decouple the compiler frontend from the compiler backend and make the two less reliant on each other overall, which would allow me (personally) on the #235 in stages, instead of writing the entirety of the compiler and then re-verifying it, it'd be simpler to write the front-end and output something standard that the middle end can consume and spit out to the backend. Decoupling all of these parts also has the distinct advantage of making OCC be able to be modified easier to handle different platforms because there's now a standard interchange format between the parts.

This issue is of similar nature to #227 in that it means we're doing a massive change to the internals and I expect something like this could take at a minimum months, but my personal feelings on this is that it would be a significant benefit in the longer term for the reasons listed above and more. I can completely understand if you see it as un-feasible but I think it's an optimistic goal to work towards.

LADSoft commented 5 years ago

It actually sounds good. Having it split out that way is something I've thought of before, especially as I want to expand the 'adl' file to more directly generate the compiler output similar to how it does ASM. I suppose the easy thing is to have the compiler generate an unoptimized version of the intermediate code to a file (in a more compact format) and then the middle portion could just read that in and do all the things it already does. We'd need some kind of profile to tell it how to allocate registers and some other stuff I guess though... as far as what the middle layer outputs I really don't know. Again the easy thing is just have it output the same intermediate code but on the other hand it would be nice to do some more advance optimizations around moving code around to avoid pipeline stalls... eventually! lol!

chuggafan commented 5 years ago

Yhea, the way CLANG/GCC does it is that it first parses everything in the frontend and does basic IR generation with 0 optimizations yet applied, after that it goes to the middle end to do all of the optimizations from there with the hints dropped in the IR, then from that the same IR is dumped into the backend to generate the final code. I know MSVC at least does a Frontend/backend, but I don't know if there's an exact middle end where they do anything else with.

LADSoft commented 5 years ago

yeah when I thought about it a while back I concluded a three tiered approach would be good. Seems like I'm in good company :) Well I like this little project but I need to get back to #227 as soon as I clear a few more items from the list :)

LADSoft commented 5 years ago

I've started working on this. It will take a while as there is a lot to it...

LADSoft commented 5 years ago

so I have done most of the infrastructure work to separate occ into multiple programs - the main thing missing is streaming compilation-related data in and out of files to communicate the data between the separate programs. Still a lot of details to take care of though.

LADSoft commented 5 years ago

this weekend I finished the rewrite and did some minor testing. at this point I've got 'hello world' compiling properly but larger programs are problematic.

chuggafan commented 5 years ago

Nice! Hoping you work out the rest of the kinks so that it's possible that it's "production worthy"!

LADSoft commented 5 years ago

I'm actually very happy, got pretty far with it given how much was rewritten. Really wasn't expecting to be able to generate any EXE file so fast :) I have an idea what might be wrong though, I went crazy rewriting the local stack allocation routine and it shows lol! I think there may be variable collisions... But even so there is still a lot of testing to do...

LADSoft commented 5 years ago

It is coming along pretty well; have a simple C++ program with maps and vectors and iostreams completely working now and did a simple test of exception handling. The basic functionality is there at this point. Up for this weekend: fastcall, inline assembly, clean up stack allocations, clean up the build environment, and work on a more sophisticated C++ program.

I do have OCCIL compiling as well but haven't done anything with it yet. Sigh.

chuggafan commented 5 years ago

Impressive amount of progress! Is there any docs on exactly what the IR language is? Or will you make that later? I can see so far most of the work has been great, but I have concerns that having file-written based IR between the parser, middle, and backend will create a huge slowdown like LLVM used to have, if we can I suggest we use some form of Memory Mapped I/O unless the frontend asks for the IR dump due to the slow access of disks compared to RAM.

LADSoft commented 5 years ago

There aren't any docs on the file format at present... I've been managing by having the reader and writer paired, they both process exactly the same data in exactly the same order (it is streamed rather than structured) but one reads the data in and one writes the data out. But yeah I can see docs would be good... I'll see if I can make time this weekend to do basic documentation.

There are actually two output formats, one writes the structured information (including symbol tables, intermediate instructions, target assembly instructions, data initializations, etc) to a binary file and the other writes it in human readable format. The only tricky part about the binary files is the use of indexes to represent symbol table entries. Depending on the scope of the symbol they might be in different tables... Otherwise it is just a lot of spitting data out, in the order it appears in various structures.

I've been a little concerned about the file I/O too... I'm not at a point where I could evaluate the impact yet though... but I could see it would be an issue down the road. Memory mapped I/O would solve that and some other problems too so it sounds like a good idea.

I don't think it would be a big deal to retrofit it into the current design but then again I've never seriously used it so I don't know for sure. Main issue being I suppose can you resize it on the fly? That would be extremely useful given the parser can compile multiple source files in one invocation...

LADSoft commented 5 years ago

I added shared memory support and some basic documentation now...

LADSoft commented 4 years ago

I've been derailed for the last week or so because one of the recent changes prior to when I started working on this issue broke the parser for one of my test programs. It is broken on the main line as well and I'm not sure why it is building on appveyor... seems like the tests should be failing sigh. Anyway I'm about through that and ready to go back to working on this issue...

LADSoft commented 4 years ago

at this point the new compiler successfully compiles the tests. Before going on to compile it with itself, I'm going to work on the MSIL version of the compiler.

chuggafan commented 4 years ago

Looking at OCCOPT, I do appreciate the SharedMemory here for faster passing of the information but there should be a way to have OCCOPT run on it's own using regular files as inputs as well so that each section can be tested independently of the other to make sure it all works correctly independent of one another a la how LLVM doesn't require sending in via memory but can accept files and output different levels of optimization for different architectures.

LADSoft commented 4 years ago

yeah I was kinda thinking about that being an issue but didn't know what to do. Maybe I should add a command line switch to occparse to dump the data to a file and then another one to occopt to load it in again? I think I did already manage to set up occparse to compile even if not called by the backend though. Hm maybe I should do it that way, if they aren't being called by the backend they use files.

As well the front end is still a little sensitive to the backend, for example a lot had to be done in the front end to support the syntaxes required by MSIL, and then the inline assembler is in the front end as well since it is primarily a parsing issue... I think for more normal architectures the main issues would be word size (which is still largely determined in the front end) and the inline assembler issue though.

chuggafan commented 4 years ago

Maybe I should add a command line switch to occparse to dump the data to a file and then another one to occopt to load it in again?

Yes, this is what clang and GCC have and it's the best option for being able to see the results of any output, modify it, and seeing how stuff changes.

As well the front end is still a little sensitive to the backend, for example a lot had to be done in the front end to support the syntaxes required by MSIL, and then the inline assembler is in the front end as well since it is primarily a parsing issue...

That's definitely one issue, where everything changes based on MSIL, maybe the output from the parser could be marked for the MSIL lang at the top and from that gets pushed through a specific backend, LLVM and GCC don't have 100% generic middle-ends, after all.

LADSoft commented 4 years ago

how do llvm and gcc handle spawning all the programs? Right now I have the back ends spawning the front and middle sections... I thought that would make it so there are less programs to have hanging about...

chuggafan commented 4 years ago

I don't know too much about how either work internally, but IIRC clang utilizes LLVM through an API, but first does all the parsing, lowers the code to the internal IR, then passes it to LLVM to do opts, then LLVM internally does the making of into assembly then passes to the linker (they don't use the system assembler due to speed constraints). GCC's arch is all over the place and nailing down any specific binaries or anything that they do is near impossible, I just know that they have defined front/middle/backend stages that are able to be gone through.

LADSoft commented 4 years ago

i decided just to keep it simple and still use the backend to invoke the front and middle.

what you said has got me thinking though, it might be possible to swap the msil backed in for the x86 one... you just wouldn't get access to the .net runtime if compiling for x86... at least right now that is possible unless a standard .net c++ parser is written...

LADSoft commented 4 years ago

this is coming along; I've got most of the support for occil working again. Need to clean up a few details then I will be able to start running the tests...

LADSoft commented 4 years ago

I got occil passing all the tests now (including sqlite3/cc386) but that broke the x86 compiler... over the next few days will work on addressing that. Then I will probably do a preliminary compile of the compiler with itself, which takes us close to the end of this sub-project. But before this can be wrapped up the debug info will have to be tested again.

LADSoft commented 4 years ago

at this point occ compiles all the tests (when compiled with msvc) and it compiles itself reasonably well... but the self-compiled version is going manic on memory allocations so it doesn't work too well lol! I'm working on figuring out what is up with that...

meanwhile I'm trying to think what to do about the MSDOS version. Seems like the current implementation with shared memory isn't going to work too well on MSDOS... so I have a choice between either dropping support or making an alternate implementation that compiles to a file the way it did originally lol... honestly I'd just as soon drop support but I keep getting flak any time I try sigh.

LADSoft commented 4 years ago

so now I got it to compile itself, and along the way fixed a few bugs unrelated to this issue. The version that was compiled by itself did compile another program so now I'm off to compile the runtime library. Seems like there are some issues with inline assembly that need to be resolved for that...

LADSoft commented 4 years ago

occ seems to be compiling stably now. Up next I have to do handle debug information and make sure the code completion compiler still works. Then it will be time to do final testing.

chuggafan commented 4 years ago

The next question is how are the builtin functions going to work and is there going to be IR code for each builtin function that has to be handled separately?

LADSoft commented 4 years ago

Thanks. I will explicitly check that later on... but I think when I was thinking about it I had thought it should work as-is. Hope I'm not wrong about that lol!

LADSoft commented 4 years ago

you got me curious so I checked it out... apparently, I had foreseen this one and propagated the intrinsic through the front and middle layers as a normal function call (it is propagated as a FASTCALL function since it has those tags in the prototype), then in the backed where function calls are compiled it checks to see if it is an intrinsic function. If so replaces it with the related code sequence! So no need for messy per-intrinsic IR handling...

I did a quick sanity check with _rotl and it seems fine...

Merry Christmas!

LADSoft commented 4 years ago

the code for this issue basically works now... and I fixed some other problems along the way. But I found that WINVI doesn't compile/run properly and have to fix that. Beyond that I'm getting ready to work on final tests... maybe if things go well I'll have this closed out by new years.

I'm probably going to abandon the MSDOS platform.

GitMensch commented 4 years ago

I'm probably going to abandon the MSDOS platform.

What does this mean? I suggest to adjust the parts that are not too hard to adjust and add hard runtime/compilation #errors with a description what is missing.

It is not worth to put endless hours into legacy (an old version of OrangeC may be used if needed) but if it is "only" about "that legacy support does take some extra effort in some parts" I vote for it to stay in - is this about more than keeping/re-adding by conditional compilation the file-based store of intermediates (which partially is in to also allow inspecting these)?

Maybe it is reasonable to only have the runtime libraries in (the code for DOS doesn't need to change much, does it?) and only support DOS via cross-compilation?

In any case I suggest to track this with a new issue which also includes an outline for the specific issues (and possible options to cope with them).

Note: As soon as you have a good guess that the compiler does work we can stress test it with #168 :-)

LADSoft commented 4 years ago

see #472 for the new MSDOS issue. I've meanwhile gotten derailed because the latest compiler (after changes the last couple of weeks) is now using too much memory and fails to compile itself... working on improving that situation.

Yeah #168 would be a good test when it gets a little further... bearing in mind though there are still OMAKE issues that have gone unresolved lol!

chuggafan commented 4 years ago

Looking at this again, a complaint I have is that in files such as ilstream.cpp and ilunstream.cpp a huge number of externs are used where actual headers should exist so that it's more organized and we know where everything is defined from. The "be.h pulls in be.p which has the prototype definitions of everything" clutters up everything because now we don't know what module belongs to what. If we are marking things as external we should at least be exposing them inside of headers or as whole classes so that we know where things are coming from instead of just marking their names and hoping for the best. This kind of usage of externs is done almost everywhere and really makes it difficult to track down where things are from and how the entire project is supposed to be organized.

And while yes by killing off be.p and the whole externs stuff will dramatically increase the includes used it'll make more sense in the end to figure out what thing's where and where items are defined. A rule I've seen/found helpful in this kind of instance is 1 header per TU should define the publicly accessible functions and variables. Namespaces here should help immensely as well as they will help define the barrier between the parser, optimizer, and the assembly emitter.

Edit: I'm also willing to make this a separate issue as "namespacing" everything and moving everything to separate headers is a whole separate task that needs to be done so that the boundaries are properly defined, but I still am of the opinion that it's a necessary one to understand where the origin of items are so that everything is cleaner.

GitMensch commented 4 years ago

Edit: I'm also willing to make this a separate issue as "namespacing" everything and moving everything to separate headers is a whole separate task that needs to be done so that the boundaries are properly defined, but I still am of the opinion that it's a necessary one to understand where the origin of items are so that everything is cleaner.

Please do so. Question: Isn't it "enough" if "be.p would group the definitions "per file" and have a comment which file the group refers to?

chuggafan commented 4 years ago

Question: Isn't it "enough" if "be.p would group the definitions "per file" and have a comment which file the group refers to?

The problem with be.p and the idea behind it is that it makes it harder to find the origin of the definition if it's all defined in one place, also, now you need to include a file for literally every header file to define the prototypes that accept that structure you've just made before the structure is even made. While yes in theory by having be.p you also technically shorten compile times, by not having the individual files with namespaces things that should be cordoned off from each other because they're not related at all are pulled in together (e.g. previously by pulling in be.p you'd also pull in stuff for initializing memory and optimization and lexing all at once).

This is why in stuff that isn't related to OCC in the project you're seeing one header per TU or one header per two TUs, stuff like that helps defined boundaries and helps mark where things are used and defined. Also, looking at it from a non-IDE assisted POV, it allows you to intuitively know where something is defined because it's pulled in from X header or X header exposes Y, I should check out X.cpp to see how it's defined.

LADSoft commented 4 years ago

at this point the new compiler compiles itself, and the compiled compiler in turn compiles itself.

it is very slow but I think I know what is wrong with that and will address when I can!

beyond that there are memory issues... one of the files in the compiler is on the edge as far as memory usage goes even after various housecleaning. I'm contemplating what the best way to deal with that is.

along the way I put the DLEA allocator back in; the old allocator never frees memory and the new compiler design couldn't cope with that sigh. I think it will be ok for OCIDE now , will have to check.

LADSoft commented 4 years ago

been taking a short holiday. Meanwhile I did speed it up quite a bit but it is still a little sluggish.

Did a minor rewrite and it uses significantly less memory (the worst case file went down by 200MB).

Think I know what to do next just a matter of getting there...

chuggafan commented 4 years ago

I just tried the latest checkout and I've found that you haven't committed the latest .p files and they're totally missing making it impossible to compile and test your changes on my machine at this point in time.

LADSoft commented 4 years ago

ok will fix tonite. Kinda blind without the automated builds I guess... I'm about ready with changes to speed things up a bit anyway :smile:

LADSoft commented 4 years ago

ok i added the .p files.

It is in pretty good shape right now, seems to compile itself properly. Still just a little tad slower than before... not too terribly bad at the moment though. It was simply awful for a while... I'm going to play with that a little and I have a small amount of clean up as well; then I can do the final compilation tests and retest the debug info one more time and it will be done. I'm going to shoot for next weekend rather than try to rush it :smile:

GitMensch commented 4 years ago

Sounds good. Can you configure Travis and Appveyor to also include branches in their build?

LADSoft commented 4 years ago

I'd not been wanting to do it on a global basis because I don't want the coverity_scan branch built, but I can at least see about enabling it for the branch I'm working on... should be far enough along that that will be interesting although I'm not convinced the 'debug' variant won't take more than an hour.

LADSoft commented 4 years ago

the builds are mostly completing except they take significantly longer and it has bumped the DEBUG build off the appveyor allowed time frame. Appveyor has a limit of 1 hour on 'free' builds. I'm working on a resolution for that... If I can do it easily I'll speed it up otherwise I have an alternative plan ready to work on... there may be stuff related to #479 I can do as a stop gap, don't know what the impact is going to be yet though.

Main issue that is keeping the other builds from getting all the way through is the PELIB test. I believe it had become disabled in the main branch and of course the code for the test rotted while it was offline... I'll get back to fixing it soon...

LADSoft commented 4 years ago

when the rewrite was first completed, the compiler compiled about 25-30% more slowly than before. After some effort that is down to about 12% more slowly so it is much better...

Additionally there is about 5% more compilation time because the new code base is slightly more complex.

There is one more thing that can be done in the short term do to speed it up then I'm going to punt and rework the builds. Unfortunately it is likely the current build process is going to be unworkable because at least one of the builds is likely to go beyond the hour mark allowed by the 'free' version of appveyor...

hope to have this issue done within the next couple of weeks.

GitMensch commented 4 years ago

This may not be much time but as OCIDE is going away later and not tight coupled to the actual compiler version I suggest to move it out to another repository now. To do this: copy the complete repo, deleting everything but the necessary files to compile it, pushing the result as a new repo, which may have it's own appveyor, and then remove it from this one.

LADSoft commented 4 years ago

ok i made a note in #439 to do that when we get further along.

I'm working on compiling gnucobol tonite...

GitMensch commented 4 years ago

Concerning the build time there is also an additional option of "decoupling" some parts of the current CI to a different project - the first one does whatever we deem it should and as part of its success triggers a different CI project, passing it some parameters including a reference to the CI generated artifacts from the first build - then this second CI downloads those and go on with the build. As it looks OrangeC would only benefit from this because of a"split" CI time I suggest to only go this route if other options don't work... and then there's also the option of additionally/partial/in general use another CI provider, for example GH actions (should only be done if there's no doubt that OrangeC will stay at GH for the next years).

LADSoft commented 4 years ago

well I have this idea I can lower the appveyor build times by maybe 20% by building it with itself two times instead of three, then using a binary comparison of the two outputs to prove that the compiler works properly when compiled with itself.... (taking into account the time stamps of course). I want to get just a little more mileage out of the current builds first as I currently know what to expect for build times...

I also mildly considered the idea of just paying appveyor for upgraded service but haven't gotten around to looking into it... I suppose it would depend on how exorbitant they want to be... I suppose if I had a more physical hobby I would have a certain cash outlay to support the hobby so why expect everything to be free? Of course if I did buy an upgrade I might also get parallel builds which would also be an improvement lol!

After a fair amount of handholding I got the 3.0 release of gnucobol to compile up to the point where I could do 'omake test'. That was with OMAKE so I'm happy. That means we are close to completing #56 as well (it has one more issue I think). Anyway I'll get back to it tonite...

GitMensch commented 4 years ago

Can you also run omake check? This should provide for a nice testsuite log with separate tests performed (and where errors occur they should be relative easy to localize).

GitMensch commented 4 years ago

I also mildly considered the idea of just paying appveyor for upgraded service but haven't gotten around to looking into it..

https://ci.appveyor.com/billing says:

Premium FOSS 2 jobs $49.5/month

That's a number that I'd rather suggest to use a different CI...

well I have this idea I can lower the appveyor build times by maybe 20% by building it with itself two times instead of three, then using a binary comparison of the two outputs to prove that the compiler works properly when compiled with itself...

That sounds useful in any case.

LADSoft commented 4 years ago

hm guess that is a bit rich for my blood lol!

FWIW I tried compiling mpir/gmp/mpfr and found an assembler bug with the code changes I previously made for that in the process; they all compile at this point but mpir/gmp won't link and I've forgotten what the hack was to get around that. I got MPFR to the point where 'make check' was generating a lot of warnings about things being defined as both imports and publics and then gave up on that as well... I'm not sure if it is in-scope to continue down this path for right now or not. I sorta wanna get back to the main scope of what I'm working on right now though...

LADSoft / OrangeC

Decoupling of the front/middle/backend #297