Open bcmills opened 8 years ago
Somewhat related: does C require types to be defined the same way across translation units? Reading through C99, it seems to only require that objects and functions with external linkage need to have the same type across translation units (6.2.7), but I can't find anything per se that disallows for example "typedef int foo;" in one translation unit and "typedef unsigned foo;" in another (assuming they don't in turn lead to incompatible object/function declarations).
(Not to say that cgo needs to support that.)
C and C++ permit the same name to designate different types in different compilation units.
I definitely agree that this restriction should be lifted. It makes it very hard to break up your code to promote maintainability. It would make more sense for a C.int, etc to be a C.int everywhere, just as an int is an int everywhere.
I don't believe we should lift this restriction. It is explicitly not a goal to make it possible to expose C types directly in Go package APIs. As Ian said, it's not even clear this is sound.
It is explicitly not a goal to make it possible to expose C types directly in Go package APIs.
The point of this request is not to write general-purpose Go packages using C types. It is to enable the creation of support libraries for Go packages that call C functions and/or Go packages that export C APIs with richer structure than primitive pointers and integers. (For example: one might want to return a protocol buffer from a Go function to a C or C++ caller without making more than one copy of the marshaled data. That operation is complex enough that it needs a support library, and because it needs to manipulate Go types it must be written in Go.)
As Ian said, it's not even clear this is sound.
The request is to make "identical" C types identical Go types, not to make C types "with the same name" identical Go types. I believe it is sound provided that we enforce that the C types are actually identical.
OK, I'm happy to reopen this, but I have no idea how to do it. It seems fundamentally at odds with Go's package system. I'm happy to look at implementation proposals though.
For the time being, is there a workaround for this? Maybe using an interface that can represent the same struct from different packages?
You can work around it in one direction (going from the C types in another package to the C types in the current package) using reflect and unsafe.Pointer. The same technique may be possible in the other direction too.
If you want to add back some of the type-safety at run time, you can use the reflect package to iterate over the struct fields to verify that they're compatible.
Just pondering if aliases or whatever comes out of https://github.com/golang/go/issues/18130 would help with this problem.
@joegrasse I don't think type aliases per se would help with the general problem: either way, you end up needing one "canonical definition" for each type, and if you've already got a canonical definition then you don't need to be able to refer to it by different names.
However, it might at least solve the subproblem of making the Go types for C typedefs have the same aliasing structure as the C types. (I'm honestly not sure whether that's currently the case: it hadn't even occurred to me to check.)
@joegrasse, probably not, but if we do #16623 (let compiler know more about cgo) then the compiler would be in a position to resolve this, if we wanted to.
(Fixed issue number, sorry.)
@rsc, do you mind double checking the issue number. I think you might have mistyped it.
@joegrasse Russ meant #16623.
Thanks
@bcmills Forgive my backtracking, but I don't understand why aliasing wouldn't solve the problem. I thought that the problem with C structs in Go was that the compiler views a C struct within a package differently as the same C struct outside of the package. Therefore you don't have a "cannonical definition" for that type. Wouldn't aliasing the C struct as a Go struct help with the problem?
@14rcole Consider this program:
foo.h:
typedef struct {
int i;
} Foo;
foo/foo.go:
package foo
// #include "foo.h"
import "C"
func Frozzle(x *C.Foo) {
…
}
bar.h:
typedef struct {
int i;
} Bar;
bar/bar.go:
package bar
// #include "bar.h"
import "C"
func Bozzle(y *C.Bar) { foo.Frozzle(y) }
This program should compile: C.Foo
in package bar
is the same C type (a typedef of a struct with the same definition) as C.Bar
in package foo
. However, that would require cgo
to write the definitions of C.Foo
and C.Bar
such that they are aliases for the same underlying type. Since the type includes the x
field which is currently unexported, there is no package in which that type could be defined.
There are other possible ways to solve the problem (e.g. by rewriting field names so that they are always exported and combining all of the C
declarations into one package), but they involve more than just a suitable application of aliases.
Also, an alias has to exist in one package P and point to another package Q. That implies P imports Q (to point at it). In the general version of the problem in this issue, both P and Q define some C type and don't know about each other at all. Then some other package M (for main) imports both and tries to mix one with the other. There's no way for aliases per se to solve this problem, because P and Q need to continue not knowing about each other, and M can't change the definitions in P and Q.
Following the latest proposed solution in #16623 (but not strictly dependent upon it), the compilers could treat declarations for _Cfoo_bar
as though they were declared from a synthetic "C" package. I believe we could also easily turn off symbol visibility rules for this package (e.g., so that lowercase struct fields are still accessible).
Then I think usual type identity rules would just work as desired, and usual type-reexporting information would help to catch ODR (one-definition rule) violations across C compilation units.
@mdempsky
the compilers could treat declarations for
_Cfoo_bar
as though they were declared from a synthetic "C" package
It can't be just one package, unfortunately. In a valid program, a type C.X
can legitimately mean two different things in two different compilation units.
We could perhaps do some sort of name-mangling to disambiguate, though. For example, we could encode the complete C type definition in the mangled name and have the compiler treat all C-mangled names as being in the same package.
The remaining concern with that approach is what to do with reflect
. (If we've mangled the names to avoid collisions, should reflect
report the mangled names, the colliding names, or something else entirely?)
What if two packages both define their own version of incompatible type T in cgo preamble? I think any solutions to this issue must check to make sure the two cgo types are indeed compatible.
Here is a very contrived example of how I came across this issue. I believe it to be a more simplistic case then what @bcmills and @rsc have both discussed above (although I could be wrong).
Consider the package: cp/cp.go
package cp
import "C"
func CTest(name *C.char) string {
return C.GoString(name)
}
and the program: ct/main.go
package main
import (
"fmt"
"cp"
)
import "C"
func main() {
s := C.CString("Hello World") // This needs to be freed later
fmt.Println(cp.CTest(s))
}
When you try and build ct/main.go, you get the following error.
# ct
./main.go:12: cannot use s (type *C.char) as type *cp.C.char in argument to cp.CTest
Before coming across this isssue, I would have thought that this program should compile, because cp.CTest takes a C.char and s is a C.char. I am not creating any new types, just using the basic C char type. For some reason though, C.char in package cp becomes a cp.C.char.
After #16623 we can figure out what the semantics should be here. It could be that we only support this for built-in C types like char/int/etc.
After #16623 we can figure out what the semantics should be here. It could be that we only support this for built-in C types like char/int/etc.
What about for types in external C libs? For external libs we can be pretty sure C.foo will always be C.foo, maybe even prefix it MyLib so like C.MyLib.foo or C.MyLib_foo?
@AlexRouSg
What about for types in external C libs?
According to the C standard, "[a]ll declarations of structure, union, or enumerated types that have the same scope and use the same tag declare the same type." That is a property of the type declarations themselves, not the libraries that implement or make use of those declarations.
The proposal here is that C types that are "the same type" according to the C standard should translate to Go types that are identical according to the Go spec.
@bcmills ohhhhh, was replying to rsc saying it might be limited to only primitive types.
I'm hitting a similar, but I believe related problem.
I'm interfacing with two different headers providing the same functionality.
In one case the header looks like this (cpython)
/* overly simplified */
typedef long Py_ssize_t;
int PyTuple_SetItem(PyObject *p, Py_ssize_t pos, PyObject *o);
The other provider (pypy) has a header that looks like this:
typedef long Py_ssize_t;
int PyTuple_SetItem(PyObject *p, long pos, PyObject *o);
When calling from go, I can't use either C.long(...)
or C.Py_ssize_t(...)
for the second argument and satisfy both implementations (despite Py_ssize_t
and long
being identical C types).
I don't know terribly much about go, but at least for primitives does it make sense to expose typedef primitive X
as type
aliases? would that even solve the problem? where do I start hacking :)
@asottile Are you including both of those headers in cgo comments in the same Go source file? If so,
at least one of C.long
or C.Py_ssize_t
should work; if it does not, that seems like a separate bug.
I have a single call:
This compiles fine with cpython, but not with pypy.
If I change it to C.long(i)
it compiles fine with pypy, but not with cpython.
So despite Py_ssize_t
and long
being identical C types in the same source module, I can't write go source that satisfies both implementations of the header.
Ah, I see. That would presumably be fixed by a solution to this issue, but seems like a simpler problem to solve on its own since it does not cross package boundaries. Mind filing a separate issue for it?
As a workaround, you can probably add one declaration or the other explicitly to your cgo preamble, or else define a static wrapper function in the cgo preamble.
Yep I'll try and write up a separate issue for this!
I've opened #21809
Change https://golang.org/cl/63277 mentions this issue: cmd/cgo: use type aliases for primitive types
Change https://golang.org/cl/63276 mentions this issue: misc/cgo/errors: port test.bash to Go
Change https://golang.org/cl/63692 mentions this issue: errors_test: fix erroneous regexp detection
Change https://golang.org/cl/63730 mentions this issue: misc/cgo/errors: test that the Go rune type is not identical to C.int
I think I have a partial solution to this for struct
and union
types. As expected, Go type aliases are the key.
Caveats:
We start by defining each converted type as an alias for its underlying Go struct type. Now the same types are identical, but too many types are identical: types with the same layout but different C tags are erroneously aliased to the same type.
To fix that problem, we can use Go struct field tags to encode the C struct tags! Because struct tags still count for type identity (#16085 notwithstanding), if we apply a Go tag containing the C tag to the first field on each struct, the two Go types will be mutually convertible but not identical. If the Go struct type does not have any fields, we add a zero-size field named _
and apply the tag to that.
https://play.golang.org/p/Dq9icy_BlH illustrates the general approach.
A simple solution for primitives would be to add some package to the standard library containing declarations for all of the C types:
package ctypes
// #cgo CGO_NOALIAS=1
import "C"
type (
Int = C.int
Uint = C.uint
…
)
Then cgo
would be able to rewrite all of the local types to be aliases for that:
package usercode
import "ctypes"
type (
_Ctype_int = ctypes.Int
...
)
For certain C types with sizes defined by the standard (e.g., int32_t
), we would instead emit typedefs directly to the corresponding Go types (e.g., int32
).
@bcmills Do you think there would be a workaround for the all fields must start with a capital letter requirement? Cause passing around third party C structs where you can't rename the fields could be very useful.
Maybe have a tag to tell cgo to caps the first letter in go?
Do you think there would be a workaround for the all fields must start with a capital letter requirement?
The only alternative I can see, short of renaming fields, would be a language change to allow lower-case names to be exported anyway, which would potentially require changes in associated tooling (godoc, the
ast` package, and likely others).
I doubt that this use-case is compelling enough to justify such a change.
Maybe have a tag to tell cgo to caps the first letter in go?
That could work, or to add a prefix (such as "C_") to each field. That wouldn't be source-compatible with existing cgo files, but it could be viable as an explicit option (e.g. specified in the cgo prelude).
I'm unclear about what problem people are talking about solving at this point.
I'm unclear about what problem people are talking about solving at this point.
The same one this issue has always been about: making identical C types (in a cgo-using Go program) identical Go types across Go packages.
Comment 329947340 addresses the subproblem of translating identical numeric C types to identical Go types.
Comment 329946826 addresses the subproblem of translating identical struct
and union
C types to identical Go types. The solution proposed in that comment requires that the names of the C members start with a capital letter (so that they becomes exported fields of the Go struct
type). Workarounds for that requirement are discussed in comments 329969062 and 330593346.
I have no interest in solving the general problem, nor in the associated complexity. Packages should not be exporting, say, *C.FILE
in their APIs. If two different packages export *C.FILE
and those are different types, that's OK.
I am slightly more sympathetic to *C.char
, but even there I don't understand why the package API doesn't just use appropriate Go types instead (like []byte
).
Oh, I see what you're saying now. I tried to address that question in comment 168719378, but apparently I was not convincing enough.
I am slightly more sympathetic to *C.char, but even there I don't understand why the package API doesn't just use appropriate Go types instead (like []byte).
*C.char
is honestly one of the least problematic types, because it already loosely corresponds to at least three idiomatic Go types ([]byte
, *byte
, or unsafe.Pointer
, depending on usage).
C.long
is a better example for a primitive, because there is no Go type to which it portably corresponds.
Packages should not be exporting, say, *C.FILE in their APIs.
Agreed. As I noted previously, “The point of this request is not to write general-purpose Go packages using C types. It is to enable the creation of support libraries for Go packages that call C functions.”
To give some concrete examples:
time.Time
, it may need a way to obtain one from a *C.struct_tm
.*os.File
, it may need a way to obtain one from a *C.FILE
.string
, it may need a way to obtain one from a *C.wchar_t
.x/text
libraries, it may need a way to convert those types to and from a *C.struct_lconv
.proto.Message
, it may need to obtain one from a *C.ProtobufCMessage
...and so on. Most of these conversions involve struct types and require non-trivial boilerplate, and some are quite subtle to implement correctly.
At the moment, either each package must implement its own copy of these conversions (inefficient and error-prone), or the exported API of the conversion helper-package must rely on error-prone unsafe.Pointer
conversions.
@joegrasse Honestly, I think that example only undermines my point. It isn't at all obvious why your cp
package needs to accept a parameter of type *C.char
instead of the idiomatic Go []byte
or string
type, considering that you can easily construct the former from the latter (as illustrated in https://golang.org/cl/56530):
package cp
import "C"
import "unsafe"
func CTest(name string) {
b := make([]byte, len(name)+1)
copy(b, name)
p := (*C.char)(unsafe.Pointer(&b[0]))
C.use(p)
}
You can apply the transformation in the reverse direction using the workaround library described in https://github.com/golang/go/issues/13656#issuecomment-303216308, or perhaps its eventual replacement described in #19367.
To reiterate: I really don't think *C.char
is a compelling example for this issue at all.
@bcmills That example was a very contrived example just to demonstrate the problem. I could have chosen any basic C type to display the problem.
@joegrasse, part of Russ's point is that this problem is not worth solving if it only affects contrived examples. I think we all understand the nature of the problem: what we need to understand is its importance. (See https://blog.golang.org/toward-go2 for a much more in-depth discussion on this point.)
In case the relevance of this issue is actually unclear and someone would benefit from an real world example, let me help out. Otherwise please ignore this comment, I have nothing technical to add to the discussion.
There are a lot of C libraries that will give you an instance of a non-basic C type, which you then use in a different C library. Last time I ran into this was while using Vulkan (the low level OpenGL "successor"), so here we go:
If you want to do GPU accelerated graphics stuff, you would typically use a library like Glfw to handle the OS dependent details, like window creation and input. There are go bindings for that, which is nice. Glfw will do the OS specific incantations for you and return an instance of VkSurfaceKHR, a Vulkan type.
Now you want to draw something into your window, so you need to pass the VkSurfaceKHR to Vulkan and do something with it.
But since the Vulkan and Glfw bindings are in different packages, you can't just get the C.VkSurfaceKHR from Glfw and use it in a Vulkan function call.
You can't put both bindings into one package, because Glfw supports m
graphics apis and there are n
platform abstraction libraries that support Vulkan. So you would end up with m*n
go packages.
This is a very real problem I encounter every couple of weeks in different contexts.
@MaVo159
Was just about to describe a very similar problem and you beat me to it.
A related issue is one large library that has a number of "modules" defined by optional header files.
It's natural to want to make these true separate packages in Go. This is doable with an internal/ package that implements everything which is used by packages that expose the actual API.
But this means that the implementation of every optional module is included in the build artifact regardless of what actually gets imported, which depending on the library/module can be rather large.
It's not a show-stopping issue, generally, but it sounds like this would fix it.
https://golang.org/cmd/cgo/ says:
While that's a convenient workaround for allowing access to struct fields and other names that would otherwise be inaccessible as Go identifiers, it greatly complicates the process of writing Go APIs for export to C callers. The Go code to produce and/or manipulate C values must be essentially confined to a single package.
It would be nice to remove that restriction: instead of treating C types as unexported local types, we should treat them as exported types in the "C" package (and similarly export the lower-case names that would otherwise be unexported).