golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.03k stars 17.67k forks source link

cmd/cgo: godefs shouldnt try to guess which struct field's identifier prefixes preceding `_` is okay to strip. #5256

Open gopherbot opened 11 years ago

gopherbot commented 11 years ago

by Mortdeus@gocos2d.org:

Godef may produce structs with multiple declarations of the same field identifiers if
its allowed to automatically strip the prefix of a C struct's field identifier. For
example,

When this C struct

struct drm_drawable_info {
    unsigned int num_rects;
        struct drm_clip_rect *rects;
};

is run through godefs it produces the following Go struct.

DrawableInfo struct {
    Rects uint32
    Pad_cgo_0 [4]byte
    Rects     *ClipRect
}

Which obviously isnt correct. 

The problem with this approach is...

A. I have to open the C source code to find out what the old identifier name used to be.
B. Obviously the tool isnt smart enough to know what prefixes are okay and which should
be stripped. 
C. This kind of "magic" implicit functionality is much better implemented as
explicit command line flag with the developer in control.
ianlancetaylor commented 11 years ago

Comment 1:

The cgo -godefs option is used during the Go bootstrap and is not documented for other
uses.  We aren't going to change its default behaviour.  But I'm fine with changing it
to, e.g., not strip a prefix if it causes duplicate field names.

Labels changed: added priority-later, removed priority-triage.

rsc commented 11 years ago

Comment 2:

Labels changed: added priority-someday, removed priority-later.

Status changed to Accepted.

rsc commented 10 years ago

Comment 3:

Labels changed: added repo-main.

rsc commented 10 years ago

Comment 4:

Adding Release=None to all Priority=Someday bugs.

Labels changed: added release-none.

mdempsky commented 10 years ago

Comment 5:

Arguably the bug here is that cgo's "fieldPrefix" function skips over fields that don't
contain a "_" character when trying to come up with a "common" prefix.  E.g., in the
example struct, "rects" does not have an underscore, so fieldPrefix simply skips over
it; then the only field it actually considers is "num_rects" and (erroneously) concludes
"num_" is a common prefix.
It would be easy to tweak fieldPrefix to realize that if a struct has fields without any
underscores then it doesn't have a 'common prefix'.  Unfortunately, that would cause at
least some field names in package syscall to change if they were regenerated; e.g., the
"Unit" field in Sysinfo_t would become "Mem_unit".
However, now that package syscall is frozen (and presumably won't be regenerated any
further), maybe that's not an issue?  Or maybe we still wouldn't want to change the
struct field names even in the new go.sys package?
fritzr commented 1 year ago

A similar issue occurs when one or more fields have a common numeric suffix following the underscore:

struct test {
  int type;
  int pad_1;
  int size;
  int pad_2;
  int etc;
};

Through cgo -godefs:

type Test struct {
        Type    int32
        1       int32 //  syntax error: unexpected literal 1, expected field name or embedded type
        Size    int32
        2       int32 //  syntax error: unexpected literal 2, expected field name or embedded type
        Etc     int32
}

In this example the 'pad_' prefix is stripped and the suffix is not a valid identifier so the output can't compile. Like with the OP's example, it's difficult to recover from this in an automated way since the original field names are lost in the output. It would be nice to at least have a command-line flag to disable the underscore stripping behavior.

ianlancetaylor commented 1 year ago

Do you want to send a patch?

jclark commented 10 months ago

I was hoping to add support to the unix package for linux/ptp_clock.h, which includes the following struct declarations:

struct ptp_sys_offset {
    unsigned int n_samples; /* Desired number of measurements. */
    unsigned int rsv[3];    /* Reserved for future use. */
    /*
     * Array of interleaved system/phc time stamps. The kernel
     * will provide 2*n_samples + 1 time stamps, with the last
     * one as a system time stamp.
     */
    struct ptp_clock_time ts[2 * PTP_MAX_SAMPLES + 1];
};

struct ptp_sys_offset_extended {
    unsigned int n_samples; /* Desired number of measurements. */
    unsigned int rsv[3];    /* Reserved for future use. */
    /*
     * Array of [system, phc, system] time stamps. The kernel will provide
     * 3*n_samples time stamps.
     */
    struct ptp_clock_time ts[PTP_MAX_SAMPLES][3];
};

struct ptp_sys_offset_precise {
    struct ptp_clock_time device;
    struct ptp_clock_time sys_realtime;
    struct ptp_clock_time sys_monoraw;
    unsigned int rsv[4];    /* Reserved for future use. */
};

cgo -godefs discards the n_ and sys_ prefixes on the field names, which seems highly unintuitive. Removing the sys_ prefix isn't so bad, but removing the n_ really isn't good.

Code is here

I wonder if we could avoid backwards compatibility problems using a heuristic such as that the prefix will only be removed if the number of fields with the prefix is greater than the number of fields with no prefix (ignoring fields starting with orig_ or _). Or should it be a flag? I am happy to work on a patch.