LADSoft / DotNetPELib

A C++11 library used to create a managed program (CIL) and dump to either .IL, .EXE, or .DLL format
Other
55 stars 10 forks source link

Some fixes #1

Closed rochus-keller closed 4 months ago

rochus-keller commented 3 years ago

This is a great library, thank you very much!

In case you're interested: here are two fixes to make it work on Linux with Mono: https://github.com/rochus-keller/PeLib/commit/a5e2be2bd2da8b6e120e29944e46c0a64b71c475. I also had to make these fixes so it compiles with GCC 4.8: https://github.com/rochus-keller/PeLib/commit/7383b0e98dbac4baa3b867321bc7134068f1d2f3.

LADSoft commented 4 months ago

I know it is very late but I never acknowledged your contribution. Apparently I had added these fixes earlier so there isn't going to be a specific commit for it.

Thank you!

rochus-keller commented 4 months ago

I did quite some refactoring with the library meanwhile and use it for my Oberon+ compiler; see https://github.com/rochus-keller/PeLib/commits/OBX/ and https://github.com/rochus-keller/Oberon.

LADSoft commented 4 months ago

yeah you really did rewrite it, it is almost unrecognizable although i saw some of the same terms being used lol! I'm glad you got some use out of it though.

I have a book about 'project oberon' somewhere. I never really got into the oberon compiler (this was before I was heavily into compiler projects) but the OS inspired an os i did long long ago....

rochus-keller commented 4 months ago

Did you really implement the C++14 and C11 standards in your OrangeC compiler? This sounds like an incredible amount of work. Do you think it is feasible to re-target your compiler? I recently came across a great compiler toolkit (see https://github.com/EigenCompilerSuite/) which I currently plan to use for my forthcoming Micron programming language; but I think it could also be a good fit for Orange C.

LADSoft commented 4 months ago

yeah i implemented c++11 and c+14. I've also recently implemented c++17, but it is in the testing stages right now... should have it out later this year. yeah it is a lot of work, the main problem though is that the amount of testing that has to be done for C++ is enormous and time consuming....

Just to give you some background I designed orangec with extensibility in mind, it is retargetable both from the front end/parsing, and the back end/code generation. There is a module in the middle that takes intermediate code and optimizes it a little bit... These are organized as 3 separate programs 'occparse' to do the c/c++ parsing, 'occopt' to perform code improvements and set things up for the backend, and 'occ' which calls the other two, then takes the intermediate code and turns it into an object file. occ also takes one or more output files and calls the linker...

so i made occ (the x86 version of the compiler) and occil (the msil version) and both have in common that they call 'occparse' and 'occopt' which are independent programs, But these two backends do different code generation... so other than having to deal with variations required by code generation everything in the front end and optimizers is in common. My next project after I get C++17 done is make a backend for win64... I've had so much to do that I've been putting it off for a very long time but it is finally the next project on the plate...

On the flip side it should be possible to swap out the parser and replace it with something that generates the intermediate code required by the other modules... thus you can compile for a new language by doing that and don't have to much deal with the optimizing/backend. Although i couldn't guarantee that any specific language wouldn't have specialized needs that call for new intermediate code instructions...

Another wrinkle is that the object file format can be easily swapped out as well, as one library is shared by the entire project. I used that fact recently to change from 'ASCII' file format to 'binary' file format... not that that really bought me anything though lol...

A final wrinkle is that the linker is actually a scriptable linker which can be used for embedded systems, and it interacts with a set of programs called 'downloaders' which I use to customize the linked files for various operating systems. Along with the downloader you write a script to tell how memory is layed out for that os, and you add a couple of lines to a configuration file to tie it all in so the linker can call the downloader automatically.

for example I've got several different MSDOS environments, a WIN32 environment, a HEX environment capable of generating motorola and intel hex files, and eventually I plan to get to linux with elf/dwarf... but that is a bigger project as I have to figure out what to do about libraries...

rochus-keller commented 4 months ago

Thank you for your explanations. This is indeed impressive work. I wasn't aware that there is a retargettable C++14 (and even 17) open-source compiler other than GCC or Clang. In my humble opinion, the implementation of something as gigantic as a C++17 compiler is a life's work; I therefore assume that you have worked for many years on it.

Where do you see the strengths of your compiler compared to Clang/LLVM, for example?

Can your compiler compile itself (i.e. can you use your own toolchain to build itself)?

My question was actually about the backend, not the frontend. I was wondering if it is possible and how much effort is required to integrate your C++ and C frontends including optimizer with the backend of Eigen Compiler Suite. The latter supports at least 16 different architectures and can also generate executables directly, without a separate linker. Based on your explanations, I would assume that your optimizer writes the output to a file in an intermediate format. One could therefore translate this intermediate code into the IR format of the Eigen compiler suite.

I will take a look at your intermediate code documentation.

Which C++ standard version are you using to implement your toolchain? Personally I prefer to use old standard versions (i.e. GCC 4.8 or MSVC 2015 compatible) and thus migrated parts of the Eigen compiler kit (see https://github.com/rochus-keller/EiGen/).

LADSoft commented 4 months ago

yeah i've been working on the present incarnation of the compiler/toolchain since about 2005. It started as a C compiler then I added on.... before that I worked on another compiler called cc386 but the source code for it was antiquate at best and I had a lot of problems with it, so I started a new project....

well LLVM has it beat they have a lot more resources than I do! The main strength of orange c is it has a relatively small footprint to the major compilers. I also like my own strategies for how to make everything resilient against retargability but of course I'm biased!

Yes it compiles itself; the appveyor builds compile it in various configurations with different mainstream compilers, then recompile it with itself with that version of the compiler, then recompile itself again with the that version. At the end the last two compiles are compared to make sure the images are the same. The images I tag for release are all the compiler built with itself...

yes it should be possible to take the intermediate code and turn it into something another backend could use.... to do that one wouldn't have to write any code to parse the IL as there are already modules within the project that will convert the il coming out of the optimizer to internal structures that can be examined...

So i start using new versions of the C++ standard as they become available in the compiler... kinda can't go beyond that because it won't compile itself if I use a newer standard.... as of right now the highest version of the standard being used to write the compiler is C++14.... but that may change as there are C++17 features that I may want to start using once they are available... im especially thinking of the structured return values and the shorthand for namespaces....

rochus-keller commented 4 months ago

This is all very impressive, especially that you do several rounds of self-compilation and comparison of the output. I didn't realize when I discovered your PeLib that in fact OrangeC is a retargetable C++11 compiler.

Personally I think that a toolchain is much more useful if it is easily buildable and portable. The possibility to easily cross-compile the toolchain without the monstrosity and volatility of LLVM would even be sensational. Many people like me develop on old systems. My main development machine runs Linux i386 with GCC 4.8. When I ported and built the Eigen Compiler suite subset, I experienced crashes even if GCC 4.8 or MSVC 2015 were able to compile without errors. So I had to even get rid of some fancy C++11 constructs to make it finally work. I therefore very much appreciate projects which use a moderate C++ version and style up front.

Have you (or anyone else) compiled your toolchain on Linux, Mac or Raspberry Pi, or are there important Win32 dependencies that prevent this (besides parts of the runtime library)? I assume that at least the compiler front-end should compile anywhere where a C++14 compiler is available, isn't it?

LADSoft commented 4 months ago

I didn't realize there were a lot of people still using such old systems. Well I guess I should have for as long as people kept askng me to support MSDOS lol.... I finally gave up on that one though....

yeah the fact it is relatively easy to recompile orangec on other operating systems is one of the advantages over LLVM I suppose...

So the story on linux is that I've also got git actions set up for this project. One of them is to compile on linux with gcc. That was a takeoff on earlier work where we had a travis build going to build linux before they took travis completely commercial... So it compiles for linux, at least.

Bad news is I haven't finished implementing pal functionality for linux; for example the replacement for windows FindFirst and friends isn't there yet. And there is some other stuff that isn't fully implemented... you can get the general scope of work by searching the project for TARGET_OS_WINDOWS. Some of it is implemented I think, but especially in the util and omake directories there is some work to be done yet...

Pretty much the entire toolchain depends on the stuff in the Util directory so that means pretty much nothing is going to run properly on linux...

I think most of it would be fairly easy to do just a matter of sitting down and doing it. I just saw it as part of the larger problem of getting the compiler to compile itself on linux and since there isn't a lot of feedback saying people actually want the lets just get it running part done I just haven't gone to the trouble of making it happen.

incidentally recently we worked with the CMAKE team to add support for orange c to cmake... another of my long-term projects is to write cmake scripts for all the tools but that is another thing I haven't gotten to yet... I suppose I am going to have to do it for the C++17 release though as I need to retest that I haven't broken the cmake support....

rochus-keller commented 4 months ago

I will take a closer look at your build system to find out what has to be built for just the preprocessor, C/C++ to IR compiler and the IR optimizer, so the result are IR files. Then I will try to set up my own build using the BUSY build system (see https://github.com/rochus-keller/BUSY). And then I will check how much work it would be to port the Win32 dependencies to LeanQt core (see https://github.com/rochus-keller/LeanQt); I will also check whether C++14 is essential, or how much work it would be to migrate to moderate C++11, so it also compiles on my development machine. But this will take some time since I'm currently implementing my Micron compiler.

LADSoft commented 4 months ago

yes if you want to look at that that would be great!

you are probably going to find that most of the C++14 being used is STL functions... but I'm not 100% sure on that. I didn't make a concerted effort to try to port things to C++14 though... If i had to guess the main problem will be use of the stl threading paradigm in omake...

on the other hand you could easily do without omake for what you want to do lol!

So one thing you should know is that right now the different modules (occparse, occopt, occ) used shared memory to transfer the IL files around. There is provision though to dump it to a file, in the case that the components are called individually. So with a little massaging of the source files you could get around it pretty easily...

rochus-keller commented 4 months ago

Thanks, I will keep you updated, but it will take a while.

rochus-keller commented 4 months ago

Just started to compile the parser using GCC 4.8 by incrementally adding required subdirectories to the qmake project. So far there was only one necessary change in c.h line 1253 to make it C++11 compatible, but now GCC crashes:

In file included from ../orange/occparse/compiler.h:43:0,
                 from ../orange/occparse/ccerr.cpp:25:
../orange/occparse/c.h: In instantiation of 'class Parser::SymbolTable<Parser::sym>':
../orange/occparse/templatededuce.h:38:54:   required from here
../orange/occparse/c.h:66:7: internal compiler error: in gen_type_die_with_usage, at dwarf2out.c:19486
 class SymbolTable

So I assume to succeed on GCC 4.8 is not feasible for the moment. I will continue with a recent GCC version when I try next time.

rochus-keller commented 4 months ago

Ok, now I tried on Windows 11 with MSVC 2015 (compiler version 14). Unfortunately it doesn't seem to compile with this version, which is a pitty. I tried to do some modifications in c.h to make the compiler happy, but the code and dependency structure seems pretty fragile; didn't find an easy work-around yet to get along with minimal changes. I had similar issues with PeLib why I eventually had to refactor and restructure large parts of the code to better understand and use it with GCC 4.8 and MSVC 2013. This doesn't seem to be feasible with the OrangeC code though. Maybe I'll get the hang of it, but at the moment I don't have a clever idea.

LADSoft commented 4 months ago

i will take a peek and see how bad it is soon as I'm done with my current project (should be tomorrow or the next day).

LADSoft commented 4 months ago

ok last night i pushed a version of OrangeC that compiles with MSVC 15. I ran some tests and the output seems ok... main problem is the builds are back to taking way too long to complete so some of the appveyor builds fail.... there is only one explicit check for MSVC 15.0, dealing with constexpr functions...

Also tested dotnetpelib... the latest version doesn't have issues compiling with MSVC15. I think at some point I may have cleaned that up when I started compiling it with various other compilers...

rochus-keller commented 4 months ago

Wow, that's great, thanks a lot; this gives me a significant chance to go along ;-)

rochus-keller commented 4 months ago

Update: I was successfully able to compile the parser on Windows with MSVC 2015 in LeanCreator; for this purpose I removed the inline assembler and all msil features; it even runs when TARGET_OS_WINDOWS is not defined; as it seems very little windows-specific features are required. Next I will do experiments and compile some c and cpp files and see what comes out.

LADSoft commented 4 months ago

cool. Since you are primarily interested in the parser I took a look at what you might be interested in for the TARGET_OS_WINDOWS not being set.

util/CmdFiles: wildcard support for file names on the command line is removed for non-windows platforms util/Util.cpp: the help text may or may not stop after the end of each screen full of data

occparse/lex.cpp: 'raw' strings won't have '\r' added before newlines occparse/osutil.cpp: there is detection for the 'console' device, it will use the 'unix' name for non-windows platforms

in general places that need the full path of the executable won't get it on non-windows platformss. There are two reasons for this 1) set the ORANGEC environment variable to point to the tools directory (which you don't need) and 2) tell StandardToolsSTartup we are preprocessing to stdout.

the other stuff in the Utils directory i think is primarily used for interprocess communication between occparse/occopt/occ which I'm guessing you will be editing out anyway lol...

rochus-keller commented 4 months ago

See here for continuation: https://github.com/LADSoft/OrangeC/pull/1027 (for the audience)