Closed mfld-fr closed 6 years ago
There are two compilers for 8086 based on GCC: ia16-unknown-elks-gcc, the original compiler by Rask, with a few improvements by me, and ia-elf-gcc, based on the work by Rask, with many improvements.
For the first, I posted here http://www.spinics.net/lists/linux-8086/msg00674.html instructions to build the compiler. It tested the building in Debian squezze 32 bits and Ubuntu 13.1 64 bits, producing a working cross-compiler. Trying to build in Debian jessie 64 bits fails.
This compiler can build applications for ELKS, but only some applications work well. The C library is based in newlib and is quite bloaty, and the linker likes to place everything in a library to the final binary. So, applications size are much larger than those produced by BCC.
The kernel can be built with this compiler. To avoid rewriting everything in assembly, I wrote an utility to convert ATT 8086 assembly to AS86, and rewrite the rule for compilation: first compiles with the S option, producing an assembly file, then convert it to as86 with the above utility, assemble it with AS86, and remove the garbage. Everything is already in the Makefile, but you need the conversion utility. It was necessary to write one function of libgcc.
The second compiler is newer, faster and produces smaller code. It also relies on newlib and has the same linker as before. So, the problems of first compiler in this regard are the same. As distributed, the compiler cannot produce ELKS binaries. To build the kernel, the Makefile needs further changes, incompatibles with ia16-unknown-elks-gcc.
INSTRUCTIONS TO COMPILE THE KERNEL USING ia16-unknown-elks-gcc.
and uncomment it.
Now, the conversion utility.
/*
Copyright (C) 2014 Juan Perez
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*/
/* gnu as to as86 source code converter */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define MAX_LINE 4096
#define MAX_OPERANDS 5
static char buffer[MAX_LINE];
static char *operands[MAX_OPERANDS];
static int sects[] = {'T', 'D', 'B', 'R',};
static char *sectmsg[] = {".text", ".data", ".bss", ".rom", ".undef",};
typedef struct {
char *att;
char *as86;
int type;
} instruction;
static int opcode = 0;
static int lineno = 1;
static char *imm_one = "#1";
static instruction mnemonics[] = {
{".align", ".align", 12},
{".arch", ".", 0},
{".ascii", ".ascii", 12},
{".asciz", ".asciz", 12},
{".att_syntax", ".", 0},
{".byte", ".byte", 12},
{".bss", ".bss", 12},
{".code16", ".", 0},
{".comm", ".comm", 9},
{".data", ".data", 12},
{".extern", ".globl", 12},
{".global", "export", 12},
{".hword", ".word", 12},
{".ident", ".", 0},
{".local", ".", 0,},
{".long", ".long", 12,},
{".p2align",".align", 7},
{".section",".sect", 6},
{".single", ".single", 12},
{".size", ".", 0},
{".skip", ".space", 8},
{".string", ".asciz", 12},
{".text", ".text", 1},
{".type", ".", 0},
{".word", ".word", 1},
{"aaa", "aaa", 1},
{"aad", "aad", 1},
{"aam", "aam", 1},
{"aas", "aas", 1},
{"adcb", "adcb", 2},
{"adcw", "adc", 2},
{"addb", "addb", 2},
{"addw", "add", 2},
{"andb", "andb", 2},
{"andw", "and", 2},
{"call", "call", 5},
{"cbtw", "cbw", 1},
{"clc", "clc", 1},
{"cld", "cld", 1},
{"cli", "cli", 1},
{"cmc", "cmc", 1},
{"cmpb", "cmpb", 2},
{"cmpsb", "cmpsb", 1},
{"cmpsw", "cmpsw", 1},
{"cmpw", "cmp", 2},
{"cwtd", "cwd", 1},
{"daa", "daa", 1},
{"das", "das", 1},
{"decb", "decb", 1},
{"decw", "dec", 1},
{"divb", "divb", 1},
{"divw", "div", 1},
{"fadd", "fadd", 1},
{"faddp", "faddp", 1},
{"faddps", "faddp", 1},
{"fadds", "fadd", 1},
{"fcom", "fcom", 1},
{"fcomp", "fcomp", 1},
{"fcomps", "fcomp", 1},
{"fcoms", "fcom", 1},
{"fdiv", "fdiv", 1},
{"fdivp", "fdivp", 1},
{"fdivr", "fdivr", 1},
{"fdivrp", "fdivrp", 1},
{"fdivrs", "fdivrs", 1},
{"fildl", "fild", 1},
{"fistl", "fist", 1},
{"fistpl", "fistp", 1},
{"fld", "fld", 1},
{"fldcw", "fldcw", 1},
{"fldenv", "fldenv", 1},
{"flds", "fld", 1},
{"fmul", "fmul", 1},
{"fmulp", "fmulp", 1},
{"fmulps", "fmulp", 1},
{"fmuls", "fmul", 1},
{"fnstcw", "fnstcw", 1},
{"fnstenv", "fnstenv", 1},
{"fnstsw", "fnstsw", 1},
{"fstp", "fstp", 1},
{"fstps", "fstp", 1},
{"fsts", "fst", 1},
{"fsub", "fsub", 1},
{"fsubp", "fsubp", 1},
{"fsubps", "fsubps", 1},
{"fsubr", "fsubr", 1},
{"fsubrp", "fsubrp", 1},
{"fsubrs", "fsubrs", 1},
{"fsubs", "fsubs", 1},
{"fxch", "fxch", 1},
{"hlt", "hlt", 1},
{"idivb", "idivb", 1},
{"idivw", "idiv", 1},
{"imulb", "imulb", 3},
{"imulw", "imul", 3},
{"inb", "inb", 2},
{"incb", "incb", 1},
{"incw", "inc", 1},
{"int", "int", 1},
{"into", "into", 1},
{"inw", "in", 2},
{"iret", "iret", 1},
{"ja", "bhi", 1}, /**/
{"jae", "bhis", 1}, /**/
{"jb", "blo", 1}, /**/
{"jbe", "blos", 1}, /**/
{"jc", "bcs", 1}, /**/
{"jcxz", "jcxz", 1},
{"je", "beq", 1}, /**/
{"jg", "bgt", 1}, /**/
{"jge", "bge", 1}, /**/
{"jl", "blt", 1}, /**/
{"jle", "ble", 1}, /**/
{"jmp", "br", 5},
{"jna", "blos", 1}, /**/
{"jnae", "blo", 1}, /**/
{"jnb", "bhis", 1}, /**/
{"jnbe", "bhi", 1}, /**/
{"jnc", "bcc", 1}, /**/
{"jne", "bne", 1}, /**/
{"jng", "ble", 1}, /**/
{"jnge", "blt", 1}, /**/
{"jnl", "bge", 1}, /**/
{"jnle", "bgt", 1}, /**/
{"jno", "bvc", 1}, /**/
{"jnp", "bpc", 1}, /**/
{"jns", "bpl", 1}, /**/
{"jnz", "bne", 1}, /**/
{"jo", "bvs", 1}, /**/
{"jp", "bps", 1}, /**/
{"jpe", "bps", 1}, /**/
{"jpo", "bpc", 1}, /**/
{"js", "bmi", 1}, /**/
{"jz", "beq", 1}, /**/
{"lahf", "lahf", 1},
{"ldsw", "lds", 2},
{"leaw", "lea", 2},
{"lesw", "les", 2},
{"lodsb", "lodsb", 1},
{"lodsw", "lodsw", 1},
{"loop", "loop", 1},
{"loope", "loope", 1},
{"loopne", "loopne", 1},
{"loopnz", "loopnz", 1},
{"loopz", "loopz", 1},
{"movb", "movb", 2},
{"movsb", "movsb", 1},
{"movsw", "movsw", 1},
{"movw", "mov", 2},
{"mulb", "mulb", 1},
{"mulw", "mul", 1},
{"negb", "negb", 1},
{"negw", "neg", 1},
{"nop", "nop", 1},
{"notb", "notb", 1},
{"notw", "not", 1},
{"orb", "orb", 2},
{"orw", "or", 2},
{"outb", "outb", 2},
{"outw", "out", 2},
{"popa", "popa", 1},
{"popf", "popf", 1},
{"popfw", "popf", 1},
{"popw", "pop", 1},
{"pusha", "pusha", 1},
{"pushf", "pushf", 1},
{"pushfw", "pushf", 1},
{"pushw", "push", 1},
{"rclb", "rclb", 4},
{"rclw", "rcl", 4},
{"rcrb", "rcrb", 4},
{"rcrw", "rcr", 4},
{"rep", "rep", 11},
{"repe", "repe", 11},
{"repne", "repne", 11},
{"repnz", "repnz", 11},
{"repz", "repz", 11},
{"ret", "ret", 1},
{"rolb", "rolb", 4},
{"rolw", "rol", 4},
{"rorb", "rorb", 4},
{"rorw", "ror", 4},
{"sahf", "sahf", 1},
{"salb", "salb", 4},
{"salw", "sal", 4},
{"sarb", "sarb", 4},
{"sarw", "sar", 4},
{"sbbb", "sbbb", 2},
{"sbbw", "sbb", 2},
{"scasb", "scasb", 1},
{"scasw", "scasw", 1},
{"shlb", "shlb", 4},
{"shlw", "shl", 4},
{"shrb", "shrb", 4},
{"shrw", "shr", 4},
{"stc", "stc", 1},
{"std", "std", 1},
{"sti", "sti", 1},
{"stosb", "stosb", 1},
{"stosw", "stosw", 1},
{"subb", "subb", 2},
{"subw", "sub", 2},
{"testb", "testb", 2},
{"testw", "test", 2},
{"xchgb", "xchgb", 1},
{"xchgw", "xchg", 1},
{"xlat", "xlatb", 11},
{"xlatb", "xlatb", 11},
{"xorb", "xorb", 2},
{"xorw", "xor", 2},
};
static int next(char *token)
{
int chr;
char *buf;
buf = token;
/*********** Skip whitespace and comments ***********/
do {
chr = getchar();
if((chr == EOF) || (chr == '\n'))
return chr;
if((chr == '#') || (chr == ';')) {
do {
chr = getchar();
} while((chr != '\n') && (chr != EOF));
return chr;
}
} while((chr <= ' ') || (chr > '~'));
/******************** Get number ********************/
if(isdigit(chr)) {
do {
*token++ = chr;
chr = getchar();
} while(isdigit(chr));
*token = 0;
ungetc(chr, stdin);
return '0';
}
/******* Get alphanumeric, labels and opcodes *******/
if(chr == '.') {
chr = getchar();
if(isalpha(chr) || (chr == '.') || (chr == '_')) {
*token++ = '.';
}
else {
ungetc(chr, stdin);
return '.';
}
}
if(isalpha(chr) || (chr == '.') || (chr == '_')) {
do {
*token++ = chr;
chr = getchar();
} while(isalnum(chr) || (chr == '.') || (chr == '_'));
if(chr == ':') {
*token++ = chr;
*token = 0;
return 'L';
}
*token = 0;
ungetc(chr, stdin);
for(opcode = 0; opcode < sizeof(mnemonics)/sizeof(instruction); opcode++) {
if (!strcasecmp(buf, mnemonics[opcode].att))
break;
}
if(opcode >= sizeof(mnemonics)/sizeof(instruction))
return 'A';
return 'I';
}
/**************** Get literal string ****************/
else if(chr == '\"') {
*token++ = chr;
do {
chr = getchar();
if((chr < ' ') && (chr > '~'))
continue;
*token++ = chr;
if(chr == '\\') {
chr = getchar();
if((chr < ' ') && (chr > '~'))
continue;
*token++ = chr;
chr = '\\';
}
} while((chr != '\"') && (chr != '\n') && (chr != EOF));
*token = 0;
return 'S';
}
/******************* Get register *******************/
else if(chr == '%') {
chr = getchar();
if(isalpha(chr)) {
do {
*token++ = chr;
chr = getchar();
} while(isalnum(chr));
*token = 0;
ungetc(chr, stdin);
return 'R';
}
ungetc(chr, stdin);
return '%';
}
return chr;
}
static int flush_line(void)
{
int chr;
do {
chr = getchar();
} while((chr != '\n') && (chr != EOF));
return chr;
}
static int flush_opr(void)
{
int chr;
do {
chr = getchar();
} while((chr != '\n') && (chr != EOF) && (chr != ','));
ungetc(chr, stdin);
return chr;
}
static int mov_opr(char *ptr)
{
int chr;
do {
chr = getchar();
*ptr++ = chr;
} while((chr != '\n') && (chr != EOF) && (chr != ','));
*(ptr - 1) = 0;
ungetc(chr, stdin);
return 'A';
}
static int line(void)
{
int nops, chr, i;
char *bufptr;
bufptr = buffer;
/* Discard empty lines */
do {
chr = next(bufptr);
if(chr == '\n')
lineno++;
} while(chr == '\n');
if(chr == EOF)
return chr;
/* Every label goes in its own line */
if(chr == 'L') {
printf("%s\n", bufptr);
return chr;
}
i = opcode;
/*********** Process invalid instructions ***********/
if((chr != 'I') || (mnemonics[opcode].type == 0)) {
if(chr != 'I')
fprintf(stderr, "!!ERROR: Line %d: Unknown %s\n", lineno, bufptr);
flush_line();
lineno++;
return '\n';
}
printf("\t%s\t", mnemonics[opcode].as86);
/************ Process .SECTION directive ************/
if(mnemonics[opcode].type == 6) {
chr = next(bufptr);
for(nops = 0; nops < 4; nops++)
if(toupper((int)(*(bufptr + 1))) == sects[nops])
break;
printf("%s\n", sectmsg[nops]);
flush_line();
lineno++;
return '\n';
}
/************ Process .P4ALIGN directive ************/
if(mnemonics[opcode].type == 7) {
next(bufptr);
chr = atoi(bufptr);
printf("%d\n", (1 << chr));
flush_line();
lineno++;
return '\n';
}
/************* Process .SKIP directive **************/
if(mnemonics[opcode].type == 8) {
next(bufptr);
printf("%s\n", bufptr);
flush_line();
lineno++;
return '\n';
}
/************* Process .COMM directive **************/
if(mnemonics[opcode].type == 9) {
next(bufptr);
printf("%s,", bufptr);
next(bufptr); next(bufptr);
printf(" %s\n", bufptr);
flush_line();
lineno++;
return '\n';
}
/********** Process indirect call and jmp ***********/
if(mnemonics[opcode].type == 5) {
do {
chr = getchar();
if((chr == EOF) || (chr == '\n'))
return chr;
if(chr == '*')
break;
} while((chr <= ' ') || (chr > '~'));
if(chr != '*')
ungetc(chr, stdin);
}
nops = 0;
do {
operands[nops] = bufptr;
do {
chr = next(bufptr);
switch(chr) {
case 'R':
flush_opr();
break;
case '$':
*bufptr = '#';
chr = mov_opr(bufptr + 1);
break;
case '(':
*bufptr = '(';
chr = next(bufptr + 1);
if(chr == 'R') {
*bufptr = '[';
do {
bufptr = bufptr + (strlen(bufptr) + 1);
*(bufptr - 1) = ' ';
chr = next(bufptr);
if(chr != ')') {
*bufptr++ = '+';
next(bufptr);
}
} while(chr != ')');
chr = ']';
}
else {
bufptr++;
}
default:
if(!isalnum(chr)) {
*bufptr = chr;
*(bufptr+1) = 0;
}
}
bufptr = bufptr + (strlen(bufptr) + 1);
*(bufptr - 1) = ' ';
} while((chr != '\n') && (chr != EOF) && (chr != ','));
bufptr--;
*(bufptr - 1) = 0;
nops++;
} while((chr != '\n') && (chr != EOF));
opcode = i;
switch(mnemonics[opcode].type) {
case 11:
nops = 0;
break;
case 4:
if(nops < 2) {
operands[1] = imm_one;
nops++;
break;
}
case 3:
if(nops >= 3) {
bufptr = operands[0];
operands[0] = operands[2];
operands[2] = bufptr;
break;
}
case 2:
if(nops >= 2) {
bufptr = operands[0];
operands[0] = operands[1];
operands[1] = bufptr;
}
case 1:
default:
break;
}
i = 0;
while(nops--) {
if(strlen(operands[i]))
printf("%s", operands[i]);
if(nops)
printf(",");
i++;
};
printf("\n");
lineno++;
return chr;
}
int main (int argc, char **argv)
{
/*char buffer[MAX_LINE];
char *operands[MAX_OPERANDS];*/
int chr;
do {
chr = line();
} while(chr != EOF);
return 0;
}
Things to do to switch compiler.
So, now, what do you prefer: ia16-unknown-elks-gcc or ia-elf-gcc? As a curiosity or permanent? Only for kernel or also for applications? I have patches to do the same for OpenWatcom as now for ia16-unknown-elks-gcc, except for a conversion utility from intel syntax to as86 syntax. Any interest on them? I will submit PR to Documentation based on your response.
Thank you very much for this big amount of information. Mmm... at a first glance I would prefer to spend time on the second compiler, because it begins to support far pointers in the latest fork available on GitHub. But let me try the two options before giving my opinion...
For the second compiler the final bit of steps is to modify Makefile-rules as follows before building the kernel:
On lines defining CPU_CC for the case of USEIA16, add the option -fleading-underscore
Replace line: CC = ia16-unknown-elks-gcc with CC = ia16-elf-gcc
Could you please quote the text of your conversion utility for correct rendering ?
Like this
See: https://help.github.com/articles/basic-writing-and-formatting-syntax/#quoting-code
I fixed the quoting.
Thanks Jody, it is going to help because I need it for the next step after succesful build of latest gcc-ia16.
I begin to like this compiler (I took the fork based on version 6.3.0 from https://github.com/tkchia/gcc-ia16) 😁 👍
ia16-elf-gcc -mtune=i8086 -fno-inline -fdata-sections -ffunction-sections -mseparate-code-segment -Wl,--gc-sections -fleading-underscore -Os -I/mnt/data/home/mfld/advantech/elks/elks/include -DELKS_VERSION_CODE=0x00020000 -DUTS_RELEASE=\"0.2.0\" -D__KERNEL__ -Wall -S -o printk.s printk.c
printk.c: In function ‘numout’:
printk.c:80:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
i = 10;
^
printk.c:83:4: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
i = 8;
^
printk.c:87:4: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
i = 11;
^
printk.c: In function ‘printk’:
printk.c:200:5: error: address of register variable ‘p’ requested
va_start(p, fmt);
^~~~~~~~
printk.c:202:5: error: address of register variable ‘p’ requested
va_end(p);
^~~~~~
I fixed most problems with ia16-elf-gcc, see latest PR. But there is a problem with commit dated Nov/03 at 15:47. I'll check that problem later
The mkromfs
host tool is to be compiled with the host compiler, not the cross one. We should rework the Makefile globals to handle both cases: native GCC for host parts and cross GCC-IA16 for target parts.
I don't know how mkromfs is compiled, but for everything else, this is the way it works now. There is no need to rework any Makefile.
At this point, I think the safest option is to keep the current build chain as designed by Juan, i.e. to keep dev86
(at least bin86
as named by many distros) to assemble and link, and to replace BCC by the latest GCC-IA16 as C front-end. This option is enough to fulfill our current needs (C99 compiler, better code checking & optimization).
It would be a good milestone if all the current kernel code were upgraded to C99 standard, with no more ugly things (like using a pointer as a scalar to being able to optimize). So let us work to make the whole current code to compile and work with the latest GCC-IA16, so that we could later drop that obsolete and non maintained BCC.
Jody could ever create a new integration branch dedicated to that, in order to preserve the current master branch and to wait for that milestone before switching definitively on the new build chain.
I agree with keeping the current build toolchain as the safest option, as a first step to a complete migration. However, it is not necessary for everybody to build the latest compilers. For example, I use the stock bcc package from Debian and it works fine. I have the Mentor ia16-elf-gcc compiler and it offers more than is needed in this transition step. We can attract more users and developers if we keep easy getting required tools. Compiling the compiler can get complicated: there are requisite development libraries and tools, with certain range of version, and even then you might require to implement workarounds. And it is a bad idea to use features that needs a certain compiler. One bad day the developer slips with a banana peel and the requisite compiler branch will get abandoned, without anybody to fix the pending bugs.
So, I think is better to include in Documents ways to get the requisite tools, from prebuilt packages and up to your scripts for those willing to spend the effort and time.
There are a lot of pending issues to be solved before abandoning BCC.
We need extensive testing of kernels compiled with ia16-gcc, with different configuration options and under different load levels. How will be able to say this step is complete if we don't have a test suite?
The conversion utility needs more work. The data and bss sizes are very different for the kernel compiled with BCC and ia16-gcc.
There is a big mess with headers files that needs to be cleaned up. The earlier developers dumped the BCC compiler include files in the directory include/linuxmt and include/arch. Then, contaminated some compiler include files with definitions that should be somewhere else. For example, stdarg.h, stddef.h, stdlib.h, limits.h, types.h should be accessed by the compiler from the standard locations (/usr/lib/bcc/include in BCC case). Then, we have to eliminate some include files and some of their contents moved to other include files.
Carefully review the code to identify volatile variables.
GCC is now integrated in the ELKS mainstream, and the provided guidelines implemented in configuration and build scripts. I think we could close that issue and focus now on the first migration step as discussed above.
@lithoxs : earlier this year, you announced that you started to compile the ELKS kernel with the latest GCC-IA6 (http://www.spinics.net/lists/linux-8086/msg00817.html).
As I can see in your PR flow, you still maintain the build with GCC-IA16, and it looks like, after tons of (more or less serious) posts to discuss that topic, that this compiler is today the best alternative to BCC.
Not only because of the work you already performed, but also because this GCC fork is still alive & able to be mainstreamed: https://github.com/tkchia/gcc-ia16
Even Alan is keeping an eye on this option: https://github.com/crtc-demos/gcc-ia16/issues/4
It would be nice if you could provide some guidelines to use that compiler in the /Documentation folder, so that we could be more to use it to improve the ELKS code.