Open mdempsky opened 8 years ago
This means making cgo part of the compiler?
One of the reason cmd/cgo is separate it that other Go compilers could potentially benefit from that to support cgo. If it supports enough of cgo magic comments.
My idea is to keep cmd/cgo behaving the same as it does today, but to extract much of its functionality as a new package like "internal/cgo" that can be used by cmd/cgo, cmd/compile/internal/gc, and go/types.
Like I said above (downside 1), we need cmd/cgo for gccgo (because it's written in C++ and can't rely on a Go library).
If much of the functionality is extracted into another package, would we still use cmd/cgo when building cgo programs?
If not, cmd/cgo might bit rot eventually.
I expect by default we would not execute cmd/cgo when building cgo programs. I think we could make that an option though (e.g., cgomode=internal/external), or at least run the current cgo regress tests both with and without cmd/cgo, if we're concerned about bit rot.
Better position information. Currently when using golang.org/x/tools/go/loader to analyze cgo-using source, the column location and byte offset information is wrong because it applies go/types to the post-transformation Go source files. Thanks to //line directives, we can at least identify the original line number, but it's troublesome/imprecise to write type-sensitive refactoring tools that rely on column information and want to support cgo-using packages.
It is indeed troublesome to write refactoring tools for cgo-using packages, but I don't see how the proposal solves the problem. Refactoring tools based on go/types
must consume the output of cgo preprocessing, not the cgo source code itself, but that means any file surgery they do will be applied to the generated file---at best; //line
directives further confuse matters. In this regard, cgo differs little from other code generation tools such as the protocol compiler: its input language is not Go, even though it may look like Go.
@alandonovan The proposal is that instead of generating intermediary Go source files, cmd/compile and go/types would consume the original .go source files directly. The refactored cmd/cgo package would provide the logic for processing the import "C"
preambles such that (e.g.) go/types could directly construct an appropriate go/types.Package.
It's true that there are other tools that generate Go source code, but those produce .go files that the user is responsible for managing themselves. Cgo is distinguished from them because it's transparently handled by cmd/go, not to mention having explicit runtime and toolchain support.
I see. While developing the loader package, I tried a similar approach, modifying go/types to accept source files in the cgo dialect, but gri convinced me that it was feature creep and that cgo should be treated as a black box. Here are some of the problems that came up:
C.f
where f is unexported. It would have to make an exception for package "C".C.f
as if it was overloaded with two different types, for example, x := C.f()
and x, err := C.f()
. This requires a hack to the type checking rule for function calls, and results in ast.AssignStmts whose types break the spec's assignability rules, confusing clients.Also, we should consider that future extensions to cgo might not be possible to support at all in go/types.
Thanks for the list of problems. My thoughts:
if !exp.Exported()
check to something like if !exp.Exported() && !pkg.imported.cgo
.It seems unlikely to me that we'd add functionality to cmd/cgo that go/types could not support at all. The source Cgo files will need to remain representable by go/ast so that cmd/gofmt can format them. If it ever becomes necessary to process additional inputs beyond the .go files for Cgo processing, we can extend go/types.Checker's API to allow callers to provide them.
Lastly, for completeness: to support cgo in go/types, users will need to provide an AST with the C preamble comments, but by default go/parser discards comments. There are a couple ways to resolve this:
Anyone have more comments for @mdempsky?
LGTM.
I prefer option 3(ii) over 3(i). We are likely to discover several awkward corner cases as we implement this and use it across the tools, but the result will still be an improvement on the status quo.
Having ParseCgo as the default would be nice, but it's fiddly because you have to select the scanner disposition towards comments before you begin. Two approaches to consider are: (i) restarting the scan from the beginning if you find an import "C". This is possible because the input is a byte slice not an io.Reader, but it might be hard to avoid duplicate error reporting. (ii) using ad-hoc logic in the parser to find the comment preceding the import "C" when you encounter the import. This is probably easier to implement.
This may be interesting but it seems lower priority than just about anything else we could be doing. And doing this is going to break lots of existing cgo code in mysterious way that will require a lot of debugging.
Is there something that just can't cope with the current way cgo works? It sounds like the answer is no.
I'm also generally skeptical about making cmd/compile include cmd/cgo. That's a layering violation: cgo is a build system-issue, not a compiling Go code issue. I don't want the compiler to turn into the build system.
Does that mean we should put SWIG (a C++ program) into cmd/compile too?
And doing this is going to break lots of existing cgo code in mysterious way that will require a lot of debugging.
I'm curious why you think this. My expectation is that cgo code will compile identically as before.
Is there something that just can't cope with the current way cgo works?
My motivating use case is it's currently impossible to write robust type-sensitive refactoring tools that work on Go source files that use cgo, because you have to apply go/types to the post-processed source.
I'm also generally skeptical about making cmd/compile include cmd/cgo. That's a layering violation: cgo is a build system-issue, not a compiling Go code issue. I don't want the compiler to turn into the build system.
I don't expect cmd/compile and go/types to take on any more responsibility for compiling C code than cmd/cgo already does internally. And x/tools/go/loader already has to run cmd/cgo.
Does that mean we should put SWIG (a C++ program) into cmd/compile too?
Point taken. Though I'll note the source for SWIG bindings are .swig, not .go, so I feel like there's a lower expectation that refactoring tools should---or even need to---support them. Certainly it would be nice if gorename could rename an identifier exported by C++ and have it also update the entire C++ code base, but I don't think anyone expects that.
However, I do think it would be nice if gorename would work for renaming symbols in files that happen to also use cgo. Currently, it replaces the file with the cgo-processed source. (I.e., you end up with a source file that says "Created by cgo - DO NOT EDIT" at the top.)
For refactor tools, it just needs to know the types of each cgo symbol, right? In that case, all it needs is to know the symbol mangling rule of cmd/cgo and then it can figure out the necessary information from cgo output.
Currently, [gorename] replaces the file with the cgo-processed source. (I.e., you end up with a source file that says "Created by cgo - DO NOT EDIT" at the top.)
Wow, I was not aware of that. I've filed https://github.com/golang/go/issues/17839.
@minux cgo does more invasive AST rewrites than just mangling C.bar to _Cfoo_bar. For example, it also rewrites C function calls into function literal invocations to instrument _cgoCheckPointer calls.
Even if the solution is to just reconstruct the pre-cgo AST, there should be a standard library or x/tools package for handling that. Otherwise, everytime we change cgo's implementation details, we end up breaking third-party tools.
But for a refactor tool, the cgo rewrite shouldn't matter (it's cgo implementation detail.) I imagine that refactor only cares that C.bar(1) is a function of type func (int32) (int, error), and then it should be able to handle C.bar as a special kind of function call and carry on (as the refactor tool should only care about the pre-cgo Go source files, right?)
@minux go/types doesn't currently support that. That's the point of this proposal.
To be clear, the primary expected benefit is not the three in the list but instead that go/types be modified to be able to more easily analyze packages that use cgo.
There's a lot of configuration that goes into the invocations of cgo. Today that configuration is partly in the environment and partly baked into the toolchain, specifically the go command. If cgo moves into a package, then that configuration has to be baked into the package too. That would mean you have to rebuild your, say, godoc just to start viewing a package with a different baked-in way to build arm binaries (say).
It seems like it would be better to introduce an easy way for the go command to provide the generated files, and then have go/types invoke the go command. Then you can substitute a different go command easily (via PATH changes) without having to rebuild everything that uses go/types when this kind of thing changes.
Also:
I'm curious why you think this. My expectation is that cgo code will compile identically as before.
Obviously that's the goal. My experience is that this is basically impossible for any significant change to cgo. Something always breaks. Many things, usually.
To be clear, the primary expected benefit is not the three in the list but instead that go/types be modified to be able to more easily analyze packages that use cgo.
That's my motivating case, yes. I meant for that to be covered by items 2 and 3. Sorry for the ambiguity.
There's a lot of configuration that goes into the invocations of cgo. Today that configuration is partly in the environment and partly baked into the toolchain, specifically the go command.
As far as I can tell, the information baked into cmd/go is just defaultCC, defaultCXX, and defaultPkgConfig. We could fetch them via go env
if we wanted to avoid baking them into go/types tools. (I'm not convinced we need to though; go/build already bakes in its own cgo information rather than querying go tool dist list -json
.)
It seems like it would be better to introduce an easy way for the go command to provide the generated files, and then have go/types invoke the go command.
The problem I'm proposing to address here isn't that invoking cgo is tricky (though it is): x/tools/go/loader already supports invoking cgo as needed.
The problem is when writing type-sensitive AST refactoring tools, we want to work with the original AST, not the rewritten AST. Otherwise you get issues like #17839 or mdempsky/unconvert#3.
Another point: if we create this library then the next request will be a way to use it in other tools instead of invoking cmd/cgo. So then we have to design the library well. Cgo, for better or worse, is already designed and works.
I think part of the problem with this proposal is that it specifies mechanism. It sounds like the problem you want to solve is how to make go/parser and go/types work on cgo programs in a way that allows, for example, gorename or gorefactor to edit cgo programs. I think it might make sense to shift this proposal to talk about ways to solve that problem. It may not involve cgo as a library at all. It still seems to me that cmd/cgo can be used in a solution to that problem, much more quickly than this refactoring.
I thought some more about this on my way home. Apologies for the high latency of this conversation overall. I will try to do better.
It wasn't clear to me until your last comment that the fundamental problem being addressed here is the problem of not being able to apply code rewriting tools to Go source files that import "C". Rereading the initial post I see that listed as benefit 2, but I did not appreciate that the problem was the lack of access to the original source code in order to write the source code back out.
The core of this proposal is really to eliminate source-to-source translation as the implementation mechanism for defining what import "C" means in cgo inputs. Whether cgo is a library is a separate implementation detail - a binary is just a library with a command line as its API - as is the choice of which other program calls into the code in whatever form it takes. Let's put those details aside and focus on what additional functionality would have to be added to cgo to avoid a source-to-source rewrite.
It seems to me that if cgo can't rewrite the code, it would instead need to write out a definition of the "C" package that answers whatever questions the compiler might have; the compiler would consult that definition as needed. Cgo would still need to write out _cgo_gotypes.go with the wrappers and type definitions; it just wouldn't need to write out modified copies of the original sources (x.cgo1.go for x.go). For example, cgo might write out a file, say, _cgo_defs.txt. Then the go command could invoke the compiler with -cgodefs _cgo_defs.txt to pass the cgo definitions in. Similarly, whatever x/tools/go/loader does to invoke cgo today it could still roughly do but expect to read _cgo_defs.txt instead of loading modified source files.
The next question is what _cgo_defs.txt looks like. A simple but extensible choice would be to define a Go struct Def
with the info cgo needs to convey and write out a JSON map[string]*Def. To make this concrete I looked at how cgo translates package net, and it looks like the only information you really need to get to the compiler is the mapping from C.xxx to the Go name used in _cgo_gotypes.go. That's almost trivial enough not to bother with JSON, but functions have two names (depending on whether they are invoked as x, err := C.f()
to get errno), so might as well make things minimally extensible. Looks like this would suffice (with a few more symbols to cover all of package net, of course):
{
"AF_INET": {"GoName": "_Cconst_AF_INET"},
"char": {"GoName": "_Ctype_char"},
"getaddrinfo": {"GoName": "_Cfunc_getaddrinfo", "Go2Name": "_C2func_getaddrinfo"},
"struct_addrinfo": {"GoName": "_Ctype_struct_addrinfo"}
}
With that extra information, the compiler and go/types could both work on the original source files. Cgo could keep writing the *.cgo1.go files for a while, to let older tools or build systems continue to use them.
What I've described handles everything in cgo except the new pointer check calls that @ianlancetaylor added. Those would need to be recreated in the compiler, but of course the compiler is in a better position to write them, and go/types need not care.
If you want to move forward with this, my suggestion would be:
The key part of the original proposal was the ability to have cmd/compile and go/types load unmodified source files and do the "translation" themselves. The _cgo_defs.txt file accomplishes that without significant changes to cgo itself or to the build process. The specific format of _cgo_defs.txt must be defined, but that API would have had to have been defined for "cgo as a library" too (along with a lot of other work that _cgo_defs.txt avoids).
In fact, you might actually get away with not having _cgo_defs.txt at all. You could instead just hard-code in the compiler the idea that after import "C", C.name is resolved by looking for _Cconst_name or _Ctype_name or _Cfunc_name or _C2func_name. That seems fine to me, especially for a quick prototype. Then the only change to the build process is to invoke the compiler on the original x.go instead of x.cgo1.go. Another possibility would be some kind of special comment on the definitions already in _cgo_gotypes.go.
//go:cgo_cname C.uint
type _Ctype_uint uint32
//go:cgo_cname C.getaddrinfo
//go:cgo_unsafe_args
func _C2func_getaddrinfo(p0 *_Ctype_char, p1 *_Ctype_char, p2 *_Ctype_struct_addrinfo, p3 **_Ctype_struct_addrinfo) (r1 _Ctype_int, r2 error) {
Whether that makes more sense depends a bit on what is best for x/tools/go/loader. Cgo could always write the comments for the compiler and the _cgo_defs.txt file for go/loader, I suppose.
The most important point is that we should focus on what additional functionality is needed from cgo instead of whether it is wrapped up in a library or a separate binary. The functionality can be added and used either way. (And if we leave it as a separate binary, that probably creates less work overall.)
That's an interesting approach. I just want to clarify that the name translation is in fact context-specific, because of course C.f
may be called both to return a single value and to return two values, and we need to know the context to know which translated name to use. We also translate C.f
differently if it is used in an expression other than calling it.
Also, an example of a case where we need to know more than just the translated name is union types; in order to generate the cgo checks correctly to address issue #15942, we need to know whether the union type can contain a pointer.
These notes don't invalidate the basic idea, but they do complicate it.
Similarly, whatever x/tools/go/loader does to invoke cgo today it could still roughly do but expect to read _cgo_defs.txt instead of loading modified source files.
To be clear, after invoking cgo, all that x/tools/go/loader currently does with the modified source files is:
If we simply parse the original files instead of the cgo-rewritten files, that would seem to require that go/types needs to process the _cgo_defs.txt file instead. So unless I'm misunderstanding your proposal, it seems to require one or more of:
I've been advocating for 1 (as an internal library instead, but as you point out this is an implementation detail to be considered separately).
2 seems acceptable too, but means committing to a public API between cmd/cgo and go/types. And if we're going to do that, it seems simpler to just make go/types invoke cmd/cgo instead (i.e., approach 1).
3 seems undesirable to me for maintainability.
Do you support any of these three approaches? Or did you have any alternatives in mind?
The core of this proposal is really to eliminate source-to-source translation as the implementation mechanism for defining what import "C" means in cgo inputs.
That would likely also make #13467 more feasible to address, which may be relevant to the cost/benefit analysis here.
@mdempsky, agreed, that seems like yet another argument against _cgo_defs.txt. I think it's fine to start by hard-coding in both the compiler and go/types that to resolve C.foo you usually look for _Cconst_foo, _Cfunc_foo, or _Ctype_foo (in that order), and in the case x,y := C.foo() you instead look for _C2func_foo. That's a very tiny amount of code, OK to just write twice (once in go/types and once in the compiler). That doesn't mean changing anything about where cgo is invoked: x/tools/go/loader would still invoke cgo.
Today, given a package with x.go importing "C" and y.go not importing "C", x/tools/go/loader runs cgo on x.go and then passes x.cgo1.go, y.go, and _cgo_gotypes.go to go/types. That would change so that x/tools/go/loader would run cgo on x.go and pass x.go, y.go, and _cgo_gotypes.go to go/types. That is, the only change in the loader is to stop converting the name x.go to x.cgo1.go when preparing the list of files for go/types.
@ianlancetaylor, yes it seems like we'd need another //go:cgo_has_pointers or something for the type definition of a union or maybe anything including a union. Whatever is needed for the pointer check is fine. :-)
@mdempsky, another possibility is to adjust cgo's name mangling algorithm so that C.foo always means _Ccgo_foo or _Ccgo2_foo (for the two-result function call). You would probably have to use the list of names xxx that appear in C.xxx and use the Ccgo prefix for those names (only), with the old prefixes for all the other names that appear just to explain the originals. For example, this program works today:
package main
/*
typedef struct { int z; } X;
typedef struct { X *x; } Y;
#define X 10
Y* f(int x) { return 0; }
*/
import "C"
func main() {
println(C.f(C.X))
}
because _Cconst_X (needed as C.X) and _Ctype_X (needed to explain C.f but not itself available as a C.name) are different identifiers. That would need to keep working. In this case _Cconst_X would become _Ccgo_X but _Ctype_X would stay _Ctype_X.
@mdempsky Perhaps this falls into the second of your three categories, but you could make go/types
call through an interface up into the application whenever it sees an import "C" or a selection C.x
(with sufficient context information to deal with function overloading, of course), and provide an external package that satisfies this interface by running cgo and parsing cgodefs.txt. Think of it as a special kind of types.Importer.
This would entail only a small number of changes to go/types
and would avoid go/types
depending on the implementation of cgo.
I think it's fine to start by hard-coding in both the compiler and go/types that to resolve C.foo you usually look for _Cconst_foo, _Cfunc_foo, or _Ctype_foo (in that order), and in the case x,y := C.foo() you instead look for _C2func_foo.
Okay, I think I'm reasonably sold on this approach (i.e., making cmd/compile and go/types aware of cmd/cgo's C.foo => _Cbar_foo mangling convention, and having them process the original Go source files + _cgo_gotypes.go) as the least-intrusive solution to the issues I pointed out.
another possibility is to adjust cgo's name mangling algorithm so that C.foo always means _Ccgo_foo or _Ccgo2_foo (for the two-result function call).
I was considering that too. Seems desirable to me to simplify compiler integration, but not critical.
It'll change runtime names though (e.g., stack traces and reflect type info). Is that a problem? cmd/cgo's documentation declares its mangling convention (albeit not under Go 1 compat), and some misc/cgo tests assume it too.
you could make go/types call through an interface up into the application whenever it sees an import "C" or a selection C.x (with sufficient context information to deal with function overloading, of course), and provide an external package that satisfies this interface by running cgo and parsing cgodefs.txt. Think of it as a special kind of types.Importer.
I'm not opposed to an extensible interface per se, but I'm worried about having to design and commit to one. Your comparison to types.Importer seems particularly worrying since I feel like that's been one of our more limiting interfaces as tools and use cases evolve.
@mdempsky That's rather pessimistic. types.Importer
had to change because of a new feature (vendoring) in the build system. In general, new features in the language and the package system will require API changes to the go/*
packages, possibly incompatible changes (as we saw with aliases). The design of cgo has been relatively stable; I think it's worth at least sketching out this approach.
@mdempsky, this seems like a definite win. Let's do this for Go 1.9 and figure out whether to simplify the cgo output at the same time to make the job easier. (We're committing to an API, essentially, if we want go/types from one release to work with as many others as possible.)
I'm going to work on cleaning up my POC CLs to be submittable. @griesemer and I just discussed the go/types public API and believe we need a small extension:
UsesCgo bool
field to enable rewriting import "C"
to cgo-mangled names, and documenting that the user needs to provide the _cgo_gotypes.go file generated by cmd/cgo.FakeImportC
and UsesCgo
together.Rationale:
import "C"
isn't reserved by the Go language spec; its special semantics are only introduced by the cmd/go tool. It's possible to manually use cmd/compile to build an actual Go package named "C", so go/types should support this too. Moreover, this should continue to be the default behavior.UsesCgo
. Since FakeImportC
and UsesCgo
have contradictory semantics, we'll explicitly disallow them together.CL https://golang.org/cl/33677 mentions this issue.
@mdempsky, if this isn't happening for Go 1.9, can you retarget for Go 1.10?
I updated CL 33677 last night. go/types adds the UsesCgo flag mentioned above and srcimporter uses it instead of FakeImportC when build.Context.OpenFile == nil. (cmd/cgo, gcc, etc. don't support using OpenFile; I tried copying source files to a temporary directory, but gcc can potentially #include arbitrary files from the source package directory.)
As a test case, it's able to successfully type check misc/cgo/test. (Also notably, but less impressively, it type checks the standard library with cgo enabled.)
One nit is that most of the time C.bar
is rewritten to _Cfoo_bar
(sometimes with fluff), but for variables they're rewritten to *_Cvar_bar
, where _Cvar_bar
is a pointer-variable initialized like var _Cvar_bar = (*T)(unsafe.Pointer(&__cgo_xxx))
, but __cgo_xxx
has type byte
. So at least without changing cmd/cgo, there's no object that C.bar
can resolve to with the same type as the expression C.bar
itself needs to have.
For the CL, I bound C.bar
to _Cvar_bar
and add an implicit pointer-type dereference.
I also discovered there are quite a few more _Cfoo_
prefixes than initially discussed. Thankfully cmd/cgo
is already strict about not allowing C.bar
to resolve to different C definition kinds in different Go source files, so the order they're tested in doesn't matter.
Thankfully cmd/cgo is already strict about not allowing C.bar to resolve to different C definition kinds in different Go source files, so the order they're tested in doesn't matter.
If I understand correctly, the intention is to potentially revisit that restriction (see #13467).
From @rsc: https://github.com/golang/go/issues/13467#issuecomment-265161883
@joegrasse, probably not, but if we do #16623 (let compiler know more about cgo) then the compiler would be in a position to resolve this, if we wanted to.
Oh, I probably spoke too early. #13467 is about allowing C.bar to refer to the same C type across different Go packages. Thus, thus is not at odds with the restriction you pointed out.
Change https://golang.org/cl/77153 mentions this issue: cmd/cgo: modify source as text, not as AST
@mdempsky Just curious if you think this will make it into Go1.11?
@joegrasse Sorry, no. CL 33677 is still unsubmitted.
@mdempsky Is there a plan to submit it for Go1.12?
Change https://golang.org/cl/231459 mentions this issue: go/types: add UsesCgo config to support _cgo_gotypes.go
Change https://golang.org/cl/234526 mentions this issue: go/types: replace Config.UsesCgo with Checker.CgoFiles
Change https://golang.org/cl/237417 mentions this issue: go/types: rename UsesCgo to go115UsesCgo
Change https://golang.org/cl/237423 mentions this issue: go/packages: move TypecheckCgo to packagesinternal
Change https://golang.org/cl/237422 mentions this issue: internal/typesinternal: remove fallback check for UsesCgo
Sorry for digging out this old issue... I just wanted to note that maybe the priority of this issue is underestimated.
Google is developing Carbon just because of interop with existing language, which confirms that this is something very important. At the same time, Go C bindings after many years are still kind of painful to use, also due to https://github.com/golang/go/issues/975, which has been around for years and keeps waiting for this very issue to be implemented.
Migration of existing applications to Go really could use some more endorsement, so it might be an option to reconsider the priority of this issue.
Currently cgo support is implemented with cmd/cgo as a separate tool that analyzes cgo-using Go code and transforms it into standard Go code. This proposal is to extract the analysis logic into a separate internal package that can be reused by cmd/compile and go/types directly without needing source rewrites. I.e., turn them into native cgo compilers, rather than needing a separate "cgofront" preprocessor step.
The expected benefits are:
Potential downsides and counter-arguments:
Alternative ideas that might achieve similar benefits while generalizing to tools other than cmd/cgo:
/cc @ianlancetaylor @alandonovan @griesemer