elliotchance / c2go

⚖️ A tool for transpiling C to Go.
MIT License
2.09k stars 155 forks source link

Unused types, constants, vars #232

Open Konstantin8105 opened 7 years ago

Konstantin8105 commented 7 years ago

Problem

After transpile that code:

#include <stdio.h>
int main(void)
{
    printf("Hello World!\n");
    return 0;
}

We see many unused types, ... for example:

type __int32_t int
type __uint32_t uint32
type __int64_t int32
type __uint64_t uint32
type __quad_t int32
type __u_quad_t uint32
type __dev_t uint32
type __uid_t uint32
type __gid_t uint32
type __ino_t uint32
type __ino64_t uint32
type __mode_t uint32
...

Solution

  1. Install https://github.com/alecthomas/gometalinter
  2. Run application ./unused ./demo/hello.go
  3. Read output:
    demo/hello.go:14:6: type __int128_t is unused (U1000)
    demo/hello.go:15:6: type __uint128_t is unused (U1000)
    demo/hello.go:16:6: type __builtin_ms_va_list is unused (U1000)
    demo/hello.go:17:6: type size_t is unused (U1000)
    demo/hello.go:18:6: type __u_char is unused (U1000)
    demo/hello.go:19:6: type __u_short is unused (U1000)
    demo/hello.go:20:6: type __u_int is unused (U1000)
    demo/hello.go:21:6: type __u_long is unused (U1000)
    demo/hello.go:22:6: type __int8_t is unused (U1000)
    ...
  4. Parse output and we can remove unused elements.

I try do it by hard and result look:

// Warning (TypedefDecl): %!s(int=333): function pointers are not supported
// Warning (TypedefDecl): %!s(int=341): function pointers are not supported
// Warning (TypedefDecl): %!s(int=350): function pointers are not supported
// Warning (TypedefDecl): %!s(int=353): function pointers are not supported
// Warning (VarDecl): %!s(int=27): probably an incorrect type translation 2

package main

import "github.com/elliotchance/c2go/noarch"

var stdin *noarch.File
var stdout *noarch.File
var stderr *noarch.File

func main() {
    __init()
    noarch.Printf([]byte("Hello World!\n\x00"))
    return
}
func __init() {
    stdin = noarch.Stdin
    stdout = noarch.Stdout
    stderr = noarch.Stderr
}
elliotchance commented 7 years ago

I have thought about this issue. There is not only many built-in types but also many functions that ship with the standard headers that add a lot of bloat to the output.

I would prefer to build in the tool (rather than using a third party command) because it's very possible that the logic and exclusions will become more complex over time.

I'm happy to strip all of this out by default and use a CLI argument like -keep-unused if they do want retain the full output.

Konstantin8105 commented 7 years ago

Example of terminal command for keeping unused types, ... - c2go transpile -keep-unused hello.c

Konstantin8105 commented 7 years ago

Example prime.c after transpiling ang removed unused var,.... look different instand of README.md

package main

import "unsafe"

import "github.com/elliotchance/c2go/noarch"

var stdin *noarch.File
var stdout *noarch.File
var stderr *noarch.File

func main() {
    __init()
    var n int
    var c int
    noarch.Printf([]byte("Enter a number\n\x00"))
    noarch.Scanf([]byte("%d\x00"), (*[1]int)(unsafe.Pointer(&n))[:])
    noarch.Printf([]byte("The number is: %d\n\x00"), n)
    if n == 2 {
        noarch.Printf([]byte("Prime number.\n\x00"))
    } else {
        for c = 2; c <= n-1; func() int {
            c += 1
            return c
        }() {
            if n%c == 0 {
                break
            }
        }
        if c != n {
            noarch.Printf([]byte("Not prime.\n\x00"))
        } else {
            noarch.Printf([]byte("Prime number.\n\x00"))
        }
    }
    return
}

func __init() {
    stdin = noarch.Stdin
    stdout = noarch.Stdout
    stderr = noarch.Stderr
}

May I change the README.md?

elliotchance commented 7 years ago

Yes, please update the README.

Konstantin8105 commented 6 years ago

We cannot use tool unused, because for Go code:

package main

import "fmt"

type number int

const (
    zero  number = 0
    one          = 1
    two          = 2
    three        = 3
)

func main() {
    for i := int(zero); i < int(three); i++ {
        fmt.printf("%d.\t%#v\n", i, number(i))
    }
}

Tools show:

○ → ../unused main.go 
main.go:9:2: const one is unused (U1000)
main.go:10:2: const two is unused (U1000)

But this is wrong result.

Konstantin8105 commented 6 years ago

Main point of that issue is 'Clean result Go code'. Now, I understood - tool 'unused' is wrong way. So, we can choose another way - if we follow by next step:

Experiment: We have a simple C code: File file.c

#include<stdio.h>
int main(){
        int a = 42;
        printf("We have number : %d", 42);
        return 0;
}

Let's change little bit: File file.c

#include<stdio_fake.h> // We change name of system header
int main(){
        int a = 42;
        printf("We have number : %d", 42);
        return 0;
}

Create a file stdio_fake.h:

void printf(const char * format, ...){
}

Run a clang like that - clang -E file2.c -I"./" and we have a clean result:

# 1 "file2.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 317 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "file2.c" 2
# 1 "./stdio_fake.h" 1

void printf(const char * format, ...){
}
# 2 "file2.c" 2

int main(){
 int a = 42;
 printf("We have number : %d", 42);
 return 0;
}

One of plus of that solution is system header files are platform indepentend.

Another solution: in file main.go we have next line:

        cmd := exec.Command("clang", "-E", args.inputFile)

also we know - the one simple think about result:

# 28 "/usr/include/x86_64-linux-gnu/bits/types.h" 2 3 4    <------ IMPORTANT
typedef unsigned char __u_char;
typedef unsigned short int __u_short;
typedef unsigned int __u_int;
# 36 "/usr/include/stdio.h" 2 3 4                             <------ IMPORTANT
struct _IO_FILE;

We know - What file give to us any function,type,struct ,... In according to last result example - it is types.h and stdio.h. So, we can ignore that entities in transpiling and at the end - we will have a clear Go code.

If we will use any of that solution or some preliminary like that, then may be we also can solve issue #237 .

or we can take a time to think about another solution.

Konstantin8105 commented 6 years ago

I will prepare the prototype.

Konstantin8105 commented 6 years ago

Example of prime.c afer changing preproccessor without any hand editing:

package main

import "os"
import "io/ioutil"
import "testing"
import "unsafe"
import "github.com/elliotchance/c2go/noarch"

type __int128_t int64
type __uint128_t uint64
type __builtin_ms_va_list []byte

func main() {
    __init()
    var n int
    var c int
    noarch.Printf([]byte("Enter a number\n\x00"))
    noarch.Scanf([]byte("%d\x00"), (*[1]int)(unsafe.Pointer(&n))[:])
    noarch.Printf([]byte("The number is: %d\n\x00"), n)
    if n == 2 {
        noarch.Printf([]byte("Prime number.\n\x00"))
    } else {
        for c = 2; c <= n-1; func() int {
            c += 1
            return c
        }() {
            if n%c == 0 {
                break
            }
        }
        if c != n {
            noarch.Printf([]byte("Not prime.\n\x00"))
        } else {
            noarch.Printf([]byte("Prime number.\n\x00"))
        }
    }
    return
}
func TestApp(t *testing.T) {
    os.Chdir("../../..")
    ioutil.WriteFile("build/stdin", []byte{'7'}, 0777)
    stdin, _ := os.Open("build/stdin")
    noarch.Stdin = noarch.NewFile(stdin)
    main()
}
func __init() {
}
Konstantin8105 commented 6 years ago

Algoritm of preprocessor: 1) Take file pp.c 2) Separate pp.c file to parts. Example of 1 part:

# 28 "/usr/include/x86_64-linux-gnu/bits/types.h" 2 3 4    <------ HEAD
typedef unsigned char __u_char;
typedef unsigned short int __u_short;
typedef unsigned int __u_int;

In HEAD we see 2 important elements:

Konstantin8105 commented 6 years ago

Problem

Konstantin8105 commented 6 years ago

May be another solution: for example user run c2go transpile file.c but at the end of work, c2go create a 2 files: file.go - transpiling user code system.go - transpiling system C headers code

Need approve for action

Konstantin8105 commented 6 years ago

@elliotchance I need your approve or comment about last message

elliotchance commented 6 years ago

You will not be able to split up user and system code. In simple examples this makes sense but in more complicated examples the same header file can be different when used in different ways.

The safest course of action is to deal with the duplicate logic after the transpile is complete, for example, let say you run:

c2go transpile foo.c bar.c

Will produce foo.go and bar.go. If they both included the same header files (which they probably did) you will see a lot of duplicate code between the files. At this stage you need to identify the functions and types that appear in more than one output file and extract them to a common file.

This solution would take in one or more Go files and produce new files, with an extra common file:

some_command foo.go bar.go

Produces a common.go and new foo.go and bar.go that do not include the elements in common.go.

It's not going to be possible to handle this duplicate code in the preprocessing stage because there are many decisions made during and after the transpiling that affect how the code is generated. It also won't be possible to split files by their include path/name. You should only rely on the input files to product output Go files with the same name.

Fundamentally this is not a difficult task (to extract the duplicates to a common file). There are already tools to parse and traverse the Go code easily (you only need to pay attention to the global types and names of the top level functions) and reliably extract parts of the AST to be written to another file.

I am trying to thing of a scenario where the macros in seperate files will resolve to different Go code. I can't think of any immediate examples but I have a feeling there will be some, and these will be tricky to deal with. That is a challenge for another day.

As of v0.17.0 (thanks to your awesome code) we can support multiple input files that get preprocessed and transpiled into a single output file. This is a great first step. This solution would work the same way except each input file would generate its own output file (much like input C files for clang produce a one-for-one .o file). Then we add on this stage and we have a much more robust way of dealing with multiple files.

Konstantin8105 commented 6 years ago

One more specific of preprocessor design - can easy solve dublicates of system include files for example: https://github.com/Konstantin8105/c2go/blob/c121213007e93e8baa745e3903ee2b9ab1f207b2/main_test.go#L342-L349 Here we see duplicate of ./tests/multi/case1/four.c file. At the one of step review, we remove that test for minimize testing. Like I remember, now, we can transpile C code like that without any dublicates in Go code:

#include<stdio.h>
#include<stdio.h> // <--- Dublicate
#include<stdio.h> // <--- Dublicate
int main(){
    prinf("All is OK");
    return 0;
}

One more: after command clang -E we will have one C clang preprocessor file and inside we can see tags (https://github.com/elliotchance/c2go/blob/master/preprocessor/parse_include_preprocessor_line_test.go#L21):

...
# 26 "/usr/include/x86_64-linux-gnu/bits/sys_errlist.h" 3 4
...
# 2 "f.c" 2
...

In according to that - we can easy separate. For that case f.c is user file, so transpile to f.go. And /usr/include/x86_64-linux-gnu/bits/sys_errlist.h to common.go

elliotchance commented 6 years ago

This would only work in cases where the headers are guaranteed to be exactly the same, which you can't guarantee or check for. System header files and regular header files need to be treated the same way, there is nothing special about a header file other than its name is common across some platforms.

Here is a concrete example of why the will not work:

errors.h:

void ERROR_FUNC() {
    printf("ERROR!");
}

main.c:

#define ERROR_FUNC error
#include "errors.h"

#undef ERROR_FUNC
#define ERROR_FUNC error2
#include "errors.h"

// We now have two different functions from a header file that is "dynamic".

This may seem like a silly example but it shows how the same header can be included to resolve to different code. You cannot deal with the duplicates at the preprocess stage, it's impossible. No compilers work like this for these reasons.

You must transpile each input C file independently, then deal with the duplicates as a Go AST problem, not as a C/preprocessor problem.

Konstantin8105 commented 6 years ago

120

Konstantin8105 commented 6 years ago

Now, idea - cleaning on postprocessor step.

0) At the end of transpiling we have Go code.
1) find all function name in Go code and save in list. For example:"freeMatrix(), freeVactor() ..."
2) print Go code without comments in temp file
3) If name from function list is found more then 1 times, So function are used and rmoved from list
4) Removed unused functions from Go code.
5) Save Go code without unused functions.

@elliotchance , Please comment.

elliotchance commented 6 years ago

@Konstantin8105 yes that sounds good.

Konstantin8105 commented 6 years ago

Now, we can identify location of struct, variable, ... from C source. So, we can create a ignore list of C header like : time.h and if some struct is from that header - so we ignore they.