LADSoft / OrangeC

OrangeC Compiler And Tool Chain
http://ladsoft.tripod.com/orange_c_compiler.html
Other
284 stars 39 forks source link

Minimal Linux implementation of SharedMemory and fixes to make it work #1029

Closed rochus-keller closed 2 months ago

rochus-keller commented 2 months ago

As discussed: https://github.com/LADSoft/OrangeC/pull/1027#issuecomment-2094893251

Works on my old Ubuntu x86 14.4 Linux and generates the same IR like the Windows compiled version.

The SharedMemory is a trivial implementation at the moment which actually doesn't share anything. I will add a true shared memory implementation when I will integrate the optimizer.

LADSoft commented 2 months ago

So the intention was that CmdFiles::DIR_SEP would change based on the platform, for windows it would be '\' and for linux '/'.... I would appreciate if you could implement that and then change your changes to reflect it....

rochus-keller commented 2 months ago

if you could implement that

Actually there are much more platform dependencies which would have to be changed (e.g. the build-in DEFINEs, '\', '/', ".\", ':') scattered around the code. I'm reluctant to change too much because I don't know the code well enough to have a handle on the effects. From an architectural standpoint it would likely be better to unite all path and filename concerns in a file system module together with the file and directory handling code.

LADSoft commented 2 months ago

thank you for humouring me! I will deal with it properly at some point

rochus-keller commented 2 months ago

As far as I understand, ilstream.cpp/h is responsible for generating the binary stream representation of the IR data (which is consumed by both the optimizer and the code generator), and iout.cpp/h generates the (optional) text representation, both feeding from the IR data which originally resides in ildata.cpp/h (as a bunch of global variables using a memory arena).

If I would want to directly generate the Eigen IR - instead of first generating the Orange IR (either binary or text rep) and then transpiling it to the Eigen IR - would it be sufficient to replace the iout.cpp implementation, or would I have to consider other parts of the code as well?

Does the binary representation of the Orange IR contain exactly the same information as the textual representation, or is the latter just a subset? If so, what information is only transported via binary representation? Is there a use of the textual representation within the Orange toolchain at all, or is it rather used only to support debugging the compiler?

I was also looking for an InputIcdFile function; can you give me a hint which code - if any - is supposed to parse the textual IR representation? But anyway, I assume it's wiser to re-implement iout.cpp instead of parsing the textual rep, isn't it?

LADSoft commented 2 months ago

right on ilstream and iout.

fwiw the instruction stream is created in istmt.cpp and iexpr.cpp, with help from iinline.cpp. Generating correct IL is itself a major undertaking so for your purposes you want to hook either ilstream or iout functionality.

The ilstream functionality is meant to pass the entire compiler context along; in addition to instructions there are symbols, types, expressions, line number information, and some other stuff, all passed in and out of their structured representations. Alot of that stuff is condensed into a few text characters like a symbol name or a series of digits in the text file..... if you want to use any of the optimizer code you are probably going to need the ilstream least going into the optimizer... but the iout functionality should have enough information to be able to fully translate to a different intermediate language, at least that is the intent. But for example going that way you wouldn't get the kind of information you need to generate debug information.... still for a straight conversion of the il to another il you might use the ilstream stream stuff to go from the compiler to the optimizer and then hook the iout functionality on the optimizer side.

yeah i havent' really had a reason to write the code to parse the textual representation of the intermediate language file. @chuggafan asked for it at one point but it is kinda low on my list of things to do...

Another approach is you can look in the occ directory for an example of how to process the intermediate code, occ.cpp process the intermediate stuff in general in the ProcessData functions and igen.cpp steps through the instruction list in generate_instructions when ProcessData encounters a function. you also get the benefit that various other things you need like global and external lists will also be handled for you. In these places various functions are called, for example there is a function in gen.cpp called asm_gosub that just handles the intracies of generating a call site, asm_je to generate a branch-if-equal, and so on, and likewise there are data functions like oa_genint to generate an integer data value, oa_genfloat for a float/double one, and so forth.... you can control segmentation, and it also supports the concept of duplicatly declared global functions if you want to implement that (I took after borland and called that functionality virtual functions but really they are in a section marked 'common' in the object file).

Theoretically all you need to do borrow the occ.cpp and igen.cpp code then replace each of the code/data generation functions with your own implementation and that would take care of it. For the backend you are using I imagine you could elide all the peephole optimizations and other stuff from occ.exe and stick purely to generating code/data. Then let the backend handle all the rest (speaking of which you probably don't need to run the optimizer portion that allocates registers, there is a flag you can set in the config file to turn it off). And that is really all you need out of occ.exe, the rest of it is composed of generating x86 instructions, then assembling them, then generating an output file.

Well that is probably what I would do if I were doing this project... it just always seemed a lot easier to deal with myriad small functions with fixed scopes than with a huge let's do everything type of function lol... and this way there is already an established method for working with this...

fwiw the arch_gen structure in occopt/beinterfdefs.cpp lists all the possibilities for code generation functions....

just be aware that when I get on to the x64 implementation later this year I have plans to change out arch_gen for a cplusplus class and get rid of all those pesky callback function declarations lol...

rochus-keller commented 2 months ago

Thank you very much for this information.

Actually I should also have type and line information since the Eigen IR can also generate debug information for all platforms.

Meanwhile I managed to build and run the optimizer on my set of test platforms and it seems to work with the changes and the fix (see the commits). My BUSY build generates an application called orangeopt which I can run with -Y and the icf file as an input and I get another icf and icd file as output which look decently. I didn't even need true shared memory so far and also didn't check how the applications are coordinated to use it; passing the icf seems just fine.

Next I will build a test application which includes everything to use ilunstream.cpp and then see whether I can use it as a basis for the Eigen IR generator; that would have the advantage that I don't have to interfere with your code base too much. I will also have a look at occ.cpp as recommended.

Now that I had the opportunity to have a look at different parts of the code I start to wonder why at all there should be an advantage to be dependent on C++14 (or higher). I would describe the code I saw so far as "C with namespaces and templates", which from my humble point of view should even be happy with C++98 (besides the STL features which only appeared in C++11). And you impressively demonstrated that such a complex project can indeed be done with only this language subset. So I asked myself wheter it is really worthwhile to use C++14 stuff, or wheter the code base would not be much more flexible when restricting to a moderate subset of C++11 (or even C++98, maybe in combination with some isolated container classes from Boost or elsewhere). As it happens, there is no C++ compiler of decent size and buildability I know of which is available/usable on older or more modest systems and which supports modern C++ development; so this could be a core benefit/killer use case of OrangeC. What's your view on this?

LADSoft commented 2 months ago

the only real excuse I have for using new language features is it is fun! and it helps test the compiler if it is using some of the features it is meant to compile. And I could see where shorthands like the C++ namespaces in C++17 could be useful, or the C++ auto-detection of structured template params... :). That said I'm not really enamored of all these new features I'm implementing although some of them could be useful... And I imagine that as I go along the libcxx tests will do a lot of the regression testing for me, at least when I get them all working....

i did want to add filestreams or whatever that C++17 feature is so I didn't have to continue to support multi-platform versions of how to search through directories and so forth... as I really really feel that wildcards on the command line are essential lol... but to be honest from our previous conversations I was already thinking along the lines of maybe conditionally compiling that in if the compiler doing the compiling supports C++17... so I will think somewhat about how far I want to take adding newer features going forward...

OCC probably doesn't compile versions of C++ before C++11... well it may, accidentally, I've never really tried it lol...

as far as modifying the code... im kinda in the middle of a major refactoring as I got tired of looking at the sources in the parser (and to be fair this was spurred because I sensed I needed to have a slightly different infrastructure to enable a change I think will speed things up). It only affects the very frontend code though, expr.cpp decl.cpp stmt.cpp types.cpp and some other code is getting moved around as I go... Im open to more contributions if you think there are any pertinent ones to make, but if you could try to stay out of those kinds of files I would very much appreciate it..... it will probably be several weeks before I'm comfortable putting that back on the main branch...

rochus-keller commented 2 months ago

Concerning C++11 compatiblity, today I implemented a minimal application which just reads the .icf file and optionally outputs an .icd file.

And it was pretty easy to do some modifications so the application compiles with GCC 4.8 and runs on my old Linux. So I can stay on this machine to implement the Eigen IR generator.

Here are the changes: https://github.com/rochus-keller/OrangeC/commit/38318fa42859093c86580c34d5cf94c14f4bf607.

(note it only compiles when the "eigen" target is active; the other applications still expect a C++14 compiler).

Concerning the C++17 filesystem features: maybe something like this could be an alternative: https://github.com/gulrak/filesystem. And of course there is also LeanQt, which offers great containers and cross-platform file system support with a small footprint (when only the minimal features are included).