Open Konstantin8105 opened 7 years ago
I have thought about this issue. There is not only many built-in types but also many functions that ship with the standard headers that add a lot of bloat to the output.
I would prefer to build in the tool (rather than using a third party command) because it's very possible that the logic and exclusions will become more complex over time.
I'm happy to strip all of this out by default and use a CLI argument like -keep-unused
if they do want retain the full output.
Example of terminal command for keeping unused types, ... - c2go transpile -keep-unused hello.c
Example prime.c
after transpiling ang removed unused var,.... look different instand of README.md
package main
import "unsafe"
import "github.com/elliotchance/c2go/noarch"
var stdin *noarch.File
var stdout *noarch.File
var stderr *noarch.File
func main() {
__init()
var n int
var c int
noarch.Printf([]byte("Enter a number\n\x00"))
noarch.Scanf([]byte("%d\x00"), (*[1]int)(unsafe.Pointer(&n))[:])
noarch.Printf([]byte("The number is: %d\n\x00"), n)
if n == 2 {
noarch.Printf([]byte("Prime number.\n\x00"))
} else {
for c = 2; c <= n-1; func() int {
c += 1
return c
}() {
if n%c == 0 {
break
}
}
if c != n {
noarch.Printf([]byte("Not prime.\n\x00"))
} else {
noarch.Printf([]byte("Prime number.\n\x00"))
}
}
return
}
func __init() {
stdin = noarch.Stdin
stdout = noarch.Stdout
stderr = noarch.Stderr
}
May I change the README.md
?
Yes, please update the README.
We cannot use tool unused
, because for Go code:
package main
import "fmt"
type number int
const (
zero number = 0
one = 1
two = 2
three = 3
)
func main() {
for i := int(zero); i < int(three); i++ {
fmt.printf("%d.\t%#v\n", i, number(i))
}
}
Tools show:
○ → ../unused main.go
main.go:9:2: const one is unused (U1000)
main.go:10:2: const two is unused (U1000)
But this is wrong result.
Main point of that issue is 'Clean result Go code'. Now, I understood - tool 'unused' is wrong way. So, we can choose another way - if we follow by next step:
Experiment:
We have a simple C code:
File file.c
#include<stdio.h>
int main(){
int a = 42;
printf("We have number : %d", 42);
return 0;
}
Let's change little bit:
File file.c
#include<stdio_fake.h> // We change name of system header
int main(){
int a = 42;
printf("We have number : %d", 42);
return 0;
}
Create a file stdio_fake.h
:
void printf(const char * format, ...){
}
Run a clang
like that - clang -E file2.c -I"./"
and we have a clean result:
# 1 "file2.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 317 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "file2.c" 2
# 1 "./stdio_fake.h" 1
void printf(const char * format, ...){
}
# 2 "file2.c" 2
int main(){
int a = 42;
printf("We have number : %d", 42);
return 0;
}
One of plus of that solution is system header files are platform indepentend.
Another solution:
in file main.go
we have next line:
cmd := exec.Command("clang", "-E", args.inputFile)
also we know - the one simple think about result:
# 28 "/usr/include/x86_64-linux-gnu/bits/types.h" 2 3 4 <------ IMPORTANT
typedef unsigned char __u_char;
typedef unsigned short int __u_short;
typedef unsigned int __u_int;
# 36 "/usr/include/stdio.h" 2 3 4 <------ IMPORTANT
struct _IO_FILE;
We know - What file give to us any function,type,struct ,... In according to last result example - it is types.h
and stdio.h
.
So, we can ignore that entities in transpiling and at the end - we will have a clear Go code.
If we will use any of that solution or some preliminary like that, then may be we also can solve issue #237 .
or we can take a time to think about another solution.
I will prepare the prototype.
Example of prime.c
afer changing preproccessor without any hand editing:
package main
import "os"
import "io/ioutil"
import "testing"
import "unsafe"
import "github.com/elliotchance/c2go/noarch"
type __int128_t int64
type __uint128_t uint64
type __builtin_ms_va_list []byte
func main() {
__init()
var n int
var c int
noarch.Printf([]byte("Enter a number\n\x00"))
noarch.Scanf([]byte("%d\x00"), (*[1]int)(unsafe.Pointer(&n))[:])
noarch.Printf([]byte("The number is: %d\n\x00"), n)
if n == 2 {
noarch.Printf([]byte("Prime number.\n\x00"))
} else {
for c = 2; c <= n-1; func() int {
c += 1
return c
}() {
if n%c == 0 {
break
}
}
if c != n {
noarch.Printf([]byte("Not prime.\n\x00"))
} else {
noarch.Printf([]byte("Prime number.\n\x00"))
}
}
return
}
func TestApp(t *testing.T) {
os.Chdir("../../..")
ioutil.WriteFile("build/stdin", []byte{'7'}, 0777)
stdin, _ := os.Open("build/stdin")
noarch.Stdin = noarch.NewFile(stdin)
main()
}
func __init() {
}
Algoritm of preprocessor: 1) Take file pp.c 2) Separate pp.c file to parts. Example of 1 part:
# 28 "/usr/include/x86_64-linux-gnu/bits/types.h" 2 3 4 <------ HEAD
typedef unsigned char __u_char;
typedef unsigned short int __u_short;
typedef unsigned int __u_int;
In HEAD we see 2 important elements:
pp.c
with new position in source for user source
7) Transpilation with one simple point - If element of clang AST tree (Decl,... )have position of source less then UserPosition, then don't transpile that part.Problem
red
code is no need now - https://codecov.io/gh/elliotchance/c2go/pull/273/src/transpiler/enum.goMay be another solution:
for example user run c2go transpile file.c
but at the end of work, c2go
create a 2 files:
file.go
- transpiling user code
system.go
- transpiling system C headers code
Need approve for action
@elliotchance I need your approve or comment about last message
You will not be able to split up user and system code. In simple examples this makes sense but in more complicated examples the same header file can be different when used in different ways.
The safest course of action is to deal with the duplicate logic after the transpile is complete, for example, let say you run:
c2go transpile foo.c bar.c
Will produce foo.go
and bar.go
. If they both included the same header files (which they probably did) you will see a lot of duplicate code between the files. At this stage you need to identify the functions and types that appear in more than one output file and extract them to a common file.
This solution would take in one or more Go files and produce new files, with an extra common file:
some_command foo.go bar.go
Produces a common.go
and new foo.go
and bar.go
that do not include the elements in common.go
.
It's not going to be possible to handle this duplicate code in the preprocessing stage because there are many decisions made during and after the transpiling that affect how the code is generated. It also won't be possible to split files by their include path/name. You should only rely on the input files to product output Go files with the same name.
Fundamentally this is not a difficult task (to extract the duplicates to a common file). There are already tools to parse and traverse the Go code easily (you only need to pay attention to the global types and names of the top level functions) and reliably extract parts of the AST to be written to another file.
I am trying to thing of a scenario where the macros in seperate files will resolve to different Go code. I can't think of any immediate examples but I have a feeling there will be some, and these will be tricky to deal with. That is a challenge for another day.
As of v0.17.0 (thanks to your awesome code) we can support multiple input files that get preprocessed and transpiled into a single output file. This is a great first step. This solution would work the same way except each input file would generate its own output file (much like input C files for clang
produce a one-for-one .o
file). Then we add on this stage and we have a much more robust way of dealing with multiple files.
One more specific of preprocessor design - can easy solve dublicates of system include files for example:
https://github.com/Konstantin8105/c2go/blob/c121213007e93e8baa745e3903ee2b9ab1f207b2/main_test.go#L342-L349
Here we see duplicate of ./tests/multi/case1/four.c
file. At the one of step review, we remove that test for minimize testing.
Like I remember, now, we can transpile C code like that without any dublicates in Go code:
#include<stdio.h>
#include<stdio.h> // <--- Dublicate
#include<stdio.h> // <--- Dublicate
int main(){
prinf("All is OK");
return 0;
}
One more: after command clang -E
we will have one C clang preprocessor file and inside we can see tags (https://github.com/elliotchance/c2go/blob/master/preprocessor/parse_include_preprocessor_line_test.go#L21):
...
# 26 "/usr/include/x86_64-linux-gnu/bits/sys_errlist.h" 3 4
...
# 2 "f.c" 2
...
In according to that - we can easy separate. For that case f.c
is user file, so transpile to f.go
. And /usr/include/x86_64-linux-gnu/bits/sys_errlist.h
to common.go
This would only work in cases where the headers are guaranteed to be exactly the same, which you can't guarantee or check for. System header files and regular header files need to be treated the same way, there is nothing special about a header file other than its name is common across some platforms.
Here is a concrete example of why the will not work:
errors.h
:
void ERROR_FUNC() {
printf("ERROR!");
}
main.c
:
#define ERROR_FUNC error
#include "errors.h"
#undef ERROR_FUNC
#define ERROR_FUNC error2
#include "errors.h"
// We now have two different functions from a header file that is "dynamic".
This may seem like a silly example but it shows how the same header can be included to resolve to different code. You cannot deal with the duplicates at the preprocess stage, it's impossible. No compilers work like this for these reasons.
You must transpile each input C file independently, then deal with the duplicates as a Go AST problem, not as a C/preprocessor problem.
Now, idea - cleaning on postprocessor step.
0) At the end of transpiling we have Go code.
1) find all function name in Go code and save in list. For example:"freeMatrix(), freeVactor() ..."
2) print Go code without comments in temp file
3) If name from function list is found more then 1 times, So function are used and rmoved from list
4) Removed unused functions from Go code.
5) Save Go code without unused functions.
@elliotchance , Please comment.
@Konstantin8105 yes that sounds good.
Now, we can identify location of struct, variable, ... from C source.
So, we can create a ignore
list of C header like : time.h
and if some struct is from that header - so we ignore they.
Problem
After transpile that code:
We see many unused types, ... for example:
Solution
https://github.com/alecthomas/gometalinter
./unused ./demo/hello.go
I try do it by hard and result look: