golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.13k stars 17.68k forks source link

cmd/cgo: make identical C types identical Go types across packages #13467

Open bcmills opened 8 years ago

bcmills commented 8 years ago

https://golang.org/cmd/cgo/ says:

Cgo translates C types into equivalent unexported Go types. Because the translations are unexported, a Go package should not expose C types in its exported API: a C type used in one Go package is different from the same C type used in another.

While that's a convenient workaround for allowing access to struct fields and other names that would otherwise be inaccessible as Go identifiers, it greatly complicates the process of writing Go APIs for export to C callers. The Go code to produce and/or manipulate C values must be essentially confined to a single package.

It would be nice to remove that restriction: instead of treating C types as unexported local types, we should treat them as exported types in the "C" package (and similarly export the lower-case names that would otherwise be unexported).

mdempsky commented 8 years ago

Somewhat related: does C require types to be defined the same way across translation units? Reading through C99, it seems to only require that objects and functions with external linkage need to have the same type across translation units (6.2.7), but I can't find anything per se that disallows for example "typedef int foo;" in one translation unit and "typedef unsigned foo;" in another (assuming they don't in turn lead to incompatible object/function declarations).

(Not to say that cgo needs to support that.)

ianlancetaylor commented 8 years ago

C and C++ permit the same name to designate different types in different compilation units.

joegrasse commented 8 years ago

I definitely agree that this restriction should be lifted. It makes it very hard to break up your code to promote maintainability. It would make more sense for a C.int, etc to be a C.int everywhere, just as an int is an int everywhere.

rsc commented 8 years ago

I don't believe we should lift this restriction. It is explicitly not a goal to make it possible to expose C types directly in Go package APIs. As Ian said, it's not even clear this is sound.

bcmills commented 8 years ago

It is explicitly not a goal to make it possible to expose C types directly in Go package APIs.

The point of this request is not to write general-purpose Go packages using C types. It is to enable the creation of support libraries for Go packages that call C functions and/or Go packages that export C APIs with richer structure than primitive pointers and integers. (For example: one might want to return a protocol buffer from a Go function to a C or C++ caller without making more than one copy of the marshaled data. That operation is complex enough that it needs a support library, and because it needs to manipulate Go types it must be written in Go.)

As Ian said, it's not even clear this is sound.

The request is to make "identical" C types identical Go types, not to make C types "with the same name" identical Go types. I believe it is sound provided that we enforce that the C types are actually identical.

rsc commented 8 years ago

OK, I'm happy to reopen this, but I have no idea how to do it. It seems fundamentally at odds with Go's package system. I'm happy to look at implementation proposals though.

14rcole commented 8 years ago

For the time being, is there a workaround for this? Maybe using an interface that can represent the same struct from different packages?

bcmills commented 8 years ago

You can work around it in one direction (going from the C types in another package to the C types in the current package) using reflect and unsafe.Pointer. The same technique may be possible in the other direction too.

If you want to add back some of the type-safety at run time, you can use the reflect package to iterate over the struct fields to verify that they're compatible.

joegrasse commented 7 years ago

Just pondering if aliases or whatever comes out of https://github.com/golang/go/issues/18130 would help with this problem.

bcmills commented 7 years ago

@joegrasse I don't think type aliases per se would help with the general problem: either way, you end up needing one "canonical definition" for each type, and if you've already got a canonical definition then you don't need to be able to refer to it by different names.

However, it might at least solve the subproblem of making the Go types for C typedefs have the same aliasing structure as the C types. (I'm honestly not sure whether that's currently the case: it hadn't even occurred to me to check.)

rsc commented 7 years ago

@joegrasse, probably not, but if we do #16623 (let compiler know more about cgo) then the compiler would be in a position to resolve this, if we wanted to.

(Fixed issue number, sorry.)

joegrasse commented 7 years ago

@rsc, do you mind double checking the issue number. I think you might have mistyped it.

ianlancetaylor commented 7 years ago

@joegrasse Russ meant #16623.

joegrasse commented 7 years ago

Thanks

14rcole commented 7 years ago

@bcmills Forgive my backtracking, but I don't understand why aliasing wouldn't solve the problem. I thought that the problem with C structs in Go was that the compiler views a C struct within a package differently as the same C struct outside of the package. Therefore you don't have a "cannonical definition" for that type. Wouldn't aliasing the C struct as a Go struct help with the problem?

bcmills commented 7 years ago

@14rcole Consider this program:

foo.h:

typedef struct {
  int i;
} Foo;

foo/foo.go:

package foo

// #include "foo.h"
import "C"

func Frozzle(x *C.Foo) {
  …
}

bar.h:

typedef struct {
  int i;
} Bar;

bar/bar.go:

package bar

// #include "bar.h"
import "C"

func Bozzle(y *C.Bar) { foo.Frozzle(y) }

This program should compile: C.Foo in package bar is the same C type (a typedef of a struct with the same definition) as C.Bar in package foo. However, that would require cgo to write the definitions of C.Foo and C.Bar such that they are aliases for the same underlying type. Since the type includes the x field which is currently unexported, there is no package in which that type could be defined.

There are other possible ways to solve the problem (e.g. by rewriting field names so that they are always exported and combining all of the C declarations into one package), but they involve more than just a suitable application of aliases.

rsc commented 7 years ago

Also, an alias has to exist in one package P and point to another package Q. That implies P imports Q (to point at it). In the general version of the problem in this issue, both P and Q define some C type and don't know about each other at all. Then some other package M (for main) imports both and tries to mix one with the other. There's no way for aliases per se to solve this problem, because P and Q need to continue not knowing about each other, and M can't change the definitions in P and Q.

mdempsky commented 7 years ago

Following the latest proposed solution in #16623 (but not strictly dependent upon it), the compilers could treat declarations for _Cfoo_bar as though they were declared from a synthetic "C" package. I believe we could also easily turn off symbol visibility rules for this package (e.g., so that lowercase struct fields are still accessible).

Then I think usual type identity rules would just work as desired, and usual type-reexporting information would help to catch ODR (one-definition rule) violations across C compilation units.

bcmills commented 7 years ago

@mdempsky

the compilers could treat declarations for _Cfoo_bar as though they were declared from a synthetic "C" package

It can't be just one package, unfortunately. In a valid program, a type C.X can legitimately mean two different things in two different compilation units.

We could perhaps do some sort of name-mangling to disambiguate, though. For example, we could encode the complete C type definition in the mangled name and have the compiler treat all C-mangled names as being in the same package.

The remaining concern with that approach is what to do with reflect. (If we've mangled the names to avoid collisions, should reflect report the mangled names, the colliding names, or something else entirely?)

minux commented 7 years ago

What if two packages both define their own version of incompatible type T in cgo preamble? I think any solutions to this issue must check to make sure the two cgo types are indeed compatible.

joegrasse commented 7 years ago

Here is a very contrived example of how I came across this issue. I believe it to be a more simplistic case then what @bcmills and @rsc have both discussed above (although I could be wrong).

Consider the package: cp/cp.go

package cp

import "C"

func CTest(name *C.char) string {
    return C.GoString(name)
}

and the program: ct/main.go

package main

import (
    "fmt"

    "cp"
)
import "C"

func main() {
    s := C.CString("Hello World") // This needs to be freed later
    fmt.Println(cp.CTest(s))
}

When you try and build ct/main.go, you get the following error.

# ct
./main.go:12: cannot use s (type *C.char) as type *cp.C.char in argument to cp.CTest

Before coming across this isssue, I would have thought that this program should compile, because cp.CTest takes a C.char and s is a C.char. I am not creating any new types, just using the basic C char type. For some reason though, C.char in package cp becomes a cp.C.char.

rsc commented 7 years ago

After #16623 we can figure out what the semantics should be here. It could be that we only support this for built-in C types like char/int/etc.

AlexRouSg commented 7 years ago

After #16623 we can figure out what the semantics should be here. It could be that we only support this for built-in C types like char/int/etc.

What about for types in external C libs? For external libs we can be pretty sure C.foo will always be C.foo, maybe even prefix it MyLib so like C.MyLib.foo or C.MyLib_foo?

bcmills commented 7 years ago

@AlexRouSg

What about for types in external C libs?

According to the C standard, "[a]ll declarations of structure, union, or enumerated types that have the same scope and use the same tag declare the same type." That is a property of the type declarations themselves, not the libraries that implement or make use of those declarations.

The proposal here is that C types that are "the same type" according to the C standard should translate to Go types that are identical according to the Go spec.

AlexRouSg commented 7 years ago

@bcmills ohhhhh, was replying to rsc saying it might be limited to only primitive types.

asottile commented 7 years ago

I'm hitting a similar, but I believe related problem.

I'm interfacing with two different headers providing the same functionality.

In one case the header looks like this (cpython)

/* overly simplified */
typedef long Py_ssize_t;

int PyTuple_SetItem(PyObject *p, Py_ssize_t pos, PyObject *o);

The other provider (pypy) has a header that looks like this:

typedef long Py_ssize_t;

int PyTuple_SetItem(PyObject *p, long pos, PyObject *o);

When calling from go, I can't use either C.long(...) or C.Py_ssize_t(...) for the second argument and satisfy both implementations (despite Py_ssize_t and long being identical C types).

I don't know terribly much about go, but at least for primitives does it make sense to expose typedef primitive X as type aliases? would that even solve the problem? where do I start hacking :)

bcmills commented 7 years ago

@asottile Are you including both of those headers in cgo comments in the same Go source file? If so, at least one of C.long or C.Py_ssize_t should work; if it does not, that seems like a separate bug.

asottile commented 7 years ago

I have a single call:

https://github.com/asottile/dockerfile/blob/bf98b2fd9f9598f141771c4170535139d59969b9/pylib/main.go#L122

This compiles fine with cpython, but not with pypy.

If I change it to C.long(i) it compiles fine with pypy, but not with cpython.

So despite Py_ssize_t and long being identical C types in the same source module, I can't write go source that satisfies both implementations of the header.

bcmills commented 7 years ago

Ah, I see. That would presumably be fixed by a solution to this issue, but seems like a simpler problem to solve on its own since it does not cross package boundaries. Mind filing a separate issue for it?

As a workaround, you can probably add one declaration or the other explicitly to your cgo preamble, or else define a static wrapper function in the cgo preamble.

asottile commented 7 years ago

Yep I'll try and write up a separate issue for this!

asottile commented 7 years ago

I've opened #21809

gopherbot commented 7 years ago

Change https://golang.org/cl/63277 mentions this issue: cmd/cgo: use type aliases for primitive types

gopherbot commented 7 years ago

Change https://golang.org/cl/63276 mentions this issue: misc/cgo/errors: port test.bash to Go

gopherbot commented 7 years ago

Change https://golang.org/cl/63692 mentions this issue: errors_test: fix erroneous regexp detection

gopherbot commented 7 years ago

Change https://golang.org/cl/63730 mentions this issue: misc/cgo/errors: test that the Go rune type is not identical to C.int

bcmills commented 7 years ago

I think I have a partial solution to this for struct and union types. As expected, Go type aliases are the key.

Caveats:

We start by defining each converted type as an alias for its underlying Go struct type. Now the same types are identical, but too many types are identical: types with the same layout but different C tags are erroneously aliased to the same type.

To fix that problem, we can use Go struct field tags to encode the C struct tags! Because struct tags still count for type identity (#16085 notwithstanding), if we apply a Go tag containing the C tag to the first field on each struct, the two Go types will be mutually convertible but not identical. If the Go struct type does not have any fields, we add a zero-size field named _ and apply the tag to that.

https://play.golang.org/p/Dq9icy_BlH illustrates the general approach.

bcmills commented 7 years ago

A simple solution for primitives would be to add some package to the standard library containing declarations for all of the C types:

package ctypes

// #cgo CGO_NOALIAS=1
import "C"

type (
    Int = C.int
    Uint = C.uint
    …
)

Then cgo would be able to rewrite all of the local types to be aliases for that:

package usercode

import "ctypes"

type (
    _Ctype_int = ctypes.Int
    ...
)

For certain C types with sizes defined by the standard (e.g., int32_t), we would instead emit typedefs directly to the corresponding Go types (e.g., int32).

AlexRouSg commented 7 years ago

@bcmills Do you think there would be a workaround for the all fields must start with a capital letter requirement? Cause passing around third party C structs where you can't rename the fields could be very useful.

Maybe have a tag to tell cgo to caps the first letter in go?

bcmills commented 7 years ago

Do you think there would be a workaround for the all fields must start with a capital letter requirement?

The only alternative I can see, short of renaming fields, would be a language change to allow lower-case names to be exported anyway, which would potentially require changes in associated tooling (godoc, theast` package, and likely others).

I doubt that this use-case is compelling enough to justify such a change.

Maybe have a tag to tell cgo to caps the first letter in go?

That could work, or to add a prefix (such as "C_") to each field. That wouldn't be source-compatible with existing cgo files, but it could be viable as an explicit option (e.g. specified in the cgo prelude).

rsc commented 7 years ago

I'm unclear about what problem people are talking about solving at this point.

bcmills commented 7 years ago

I'm unclear about what problem people are talking about solving at this point.

The same one this issue has always been about: making identical C types (in a cgo-using Go program) identical Go types across Go packages.

Comment 329947340 addresses the subproblem of translating identical numeric C types to identical Go types.

Comment 329946826 addresses the subproblem of translating identical struct and union C types to identical Go types. The solution proposed in that comment requires that the names of the C members start with a capital letter (so that they becomes exported fields of the Go struct type). Workarounds for that requirement are discussed in comments 329969062 and 330593346.

rsc commented 7 years ago

I have no interest in solving the general problem, nor in the associated complexity. Packages should not be exporting, say, *C.FILE in their APIs. If two different packages export *C.FILE and those are different types, that's OK.

I am slightly more sympathetic to *C.char, but even there I don't understand why the package API doesn't just use appropriate Go types instead (like []byte).

bcmills commented 7 years ago

Oh, I see what you're saying now. I tried to address that question in comment 168719378, but apparently I was not convincing enough.

I am slightly more sympathetic to *C.char, but even there I don't understand why the package API doesn't just use appropriate Go types instead (like []byte).

*C.char is honestly one of the least problematic types, because it already loosely corresponds to at least three idiomatic Go types ([]byte, *byte, or unsafe.Pointer, depending on usage).

C.long is a better example for a primitive, because there is no Go type to which it portably corresponds.

Packages should not be exporting, say, *C.FILE in their APIs.

Agreed. As I noted previously, “The point of this request is not to write general-purpose Go packages using C types. It is to enable the creation of support libraries for Go packages that call C functions.”

To give some concrete examples:

...and so on. Most of these conversions involve struct types and require non-trivial boilerplate, and some are quite subtle to implement correctly.

At the moment, either each package must implement its own copy of these conversions (inefficient and error-prone), or the exported API of the conversion helper-package must rely on error-prone unsafe.Pointer conversions.

joegrasse commented 7 years ago

@rsc Here is a very basic example of my problem and interest in this issue. As you stated here, I would really only care about the basic C types.

bcmills commented 7 years ago

@joegrasse Honestly, I think that example only undermines my point. It isn't at all obvious why your cp package needs to accept a parameter of type *C.char instead of the idiomatic Go []byte or string type, considering that you can easily construct the former from the latter (as illustrated in https://golang.org/cl/56530):

package cp

import "C"

import "unsafe"

func CTest(name string) {
    b := make([]byte, len(name)+1)
    copy(b, name)
    p := (*C.char)(unsafe.Pointer(&b[0]))
    C.use(p)
}

You can apply the transformation in the reverse direction using the workaround library described in https://github.com/golang/go/issues/13656#issuecomment-303216308, or perhaps its eventual replacement described in #19367.

To reiterate: I really don't think *C.char is a compelling example for this issue at all.

joegrasse commented 7 years ago

@bcmills That example was a very contrived example just to demonstrate the problem. I could have chosen any basic C type to display the problem.

bcmills commented 7 years ago

@joegrasse, part of Russ's point is that this problem is not worth solving if it only affects contrived examples. I think we all understand the nature of the problem: what we need to understand is its importance. (See https://blog.golang.org/toward-go2 for a much more in-depth discussion on this point.)

FlorianUekermann commented 7 years ago

In case the relevance of this issue is actually unclear and someone would benefit from an real world example, let me help out. Otherwise please ignore this comment, I have nothing technical to add to the discussion.

There are a lot of C libraries that will give you an instance of a non-basic C type, which you then use in a different C library. Last time I ran into this was while using Vulkan (the low level OpenGL "successor"), so here we go:

If you want to do GPU accelerated graphics stuff, you would typically use a library like Glfw to handle the OS dependent details, like window creation and input. There are go bindings for that, which is nice. Glfw will do the OS specific incantations for you and return an instance of VkSurfaceKHR, a Vulkan type.

Now you want to draw something into your window, so you need to pass the VkSurfaceKHR to Vulkan and do something with it.

But since the Vulkan and Glfw bindings are in different packages, you can't just get the C.VkSurfaceKHR from Glfw and use it in a Vulkan function call.

You can't put both bindings into one package, because Glfw supports m graphics apis and there are n platform abstraction libraries that support Vulkan. So you would end up with m*n go packages.

This is a very real problem I encounter every couple of weeks in different contexts.

AlexRouSg commented 7 years ago

@MaVo159

Was just about to describe a very similar problem and you beat me to it.

jimmyfrasche commented 7 years ago

A related issue is one large library that has a number of "modules" defined by optional header files.

It's natural to want to make these true separate packages in Go. This is doable with an internal/ package that implements everything which is used by packages that expose the actual API.

But this means that the implementation of every optional module is included in the build artifact regardless of what actually gets imported, which depending on the library/module can be rather large.

It's not a show-stopping issue, generally, but it sounds like this would fix it.