MobilityData / gtfs-realtime-bindings

Language bindings generated from the GTFS Realtime protocol buffer spec for popular languages.
Apache License 2.0
373 stars 129 forks source link

Deserializing feed: missing optional fields return 0 value in node.js #53

Open nselikoff opened 5 years ago

nselikoff commented 5 years ago

Given a feed that does NOT have an optional field (e.g. vehicle speed), the value returned when getting this value is 0. So there's no way to differentiate "not included" from an actual value of zero.

const feed = realtime.transit_realtime.FeedMessage.decode(protobuf);
feed.entity.forEach((entity) => {
  if (entity.vehicle) {
    const speed = entity.vehicle.position.speed; // speed is 0
  }
});

This may be similar to #52

barbeau commented 5 years ago

@nselikoff Thanks for the report! In Java bindings there is a hasSpeed() method that you can call to determine if the field has been set, but I'm not as familiar with the Node.js bindings. I'm assuming there isn't an equivalent?

nselikoff commented 5 years ago

@barbeau good question - just looked and there is not a similar method. However, I found a workaround using realtime.transit_realtime.Position.toObject:

const feed = realtime.transit_realtime.FeedMessage.decode(protobuf);
feed.entity.forEach((entity) => {
  if (entity.vehicle) {
    // using `toObject` only adds properties that exist in the protobuf
    const position = realtime.transit_realtime.Position.toObject(entity.vehicle.position);
    // so speed will be undefined if not included in the protobuf
    const speed = position.speed; 
    // maybe you want it null
    // const speed = position.speed || null;
  }
});
tmdkamins commented 2 months ago

Same issue. All of the optional fields are somewhat captured in the interface definitions as ? optional as well as |null (arguably the wrong way and in conflict with itself, but at least captured), e.g.:

/** Properties of a VehiclePosition. */
interface IVehiclePosition {
    trip?: (transit_realtime.ITripDescriptor|null);
    vehicle?: (transit_realtime.IVehicleDescriptor|null);
    position?: (transit_realtime.IPosition|null);
    currentStopSequence?: (number|null);
    stopId?: (string|null);
    currentStatus?: (transit_realtime.VehiclePosition.VehicleStopStatus|null);
    timestamp?: (number|Long|null);
    congestionLevel?: (transit_realtime.VehiclePosition.CongestionLevel|null);
    occupancyStatus?: (transit_realtime.VehiclePosition.OccupancyStatus|null);
    occupancyPercentage?: (number|null);
    multiCarriageDetails?: (transit_realtime.VehiclePosition.ICarriageDetails[]|null);
}

But then all the class implementations lose that optionality on primitive fields, while preserving it on other entity types, e.g.:

/** Represents a VehiclePosition. */
class VehiclePosition implements IVehiclePosition {
    constructor(properties?: transit_realtime.IVehiclePosition);
    public trip?: (transit_realtime.ITripDescriptor|null);
    public vehicle?: (transit_realtime.IVehicleDescriptor|null);
    public position?: (transit_realtime.IPosition|null);
    public currentStopSequence: number;
    public stopId: string;
    public currentStatus: transit_realtime.VehiclePosition.VehicleStopStatus;
    public timestamp: (number|Long);
    public congestionLevel: transit_realtime.VehiclePosition.CongestionLevel;
    public occupancyStatus: transit_realtime.VehiclePosition.OccupancyStatus;
    public occupancyPercentage: number;
    public multiCarriageDetails: transit_realtime.VehiclePosition.ICarriageDetails[];
    // ...
    public static toObject(message: transit_realtime.VehiclePosition, options?: $protobuf.IConversionOptions): { [k: string]: any };
}

The workaround suggested (using toObject) does work, as does toJSON, so clearly the data is being decoded properly. This suggests whatever tool was used to auto-generate these Node/TypeScript bindings simply had deficiencies around optional fields. Why would it it stuff 0 values into a class object in the first place instead of allowing them to be null as the interface suggests?

Related, the use of null for optional fields is completely atypical for Node/JavaScript/TypeScript ecosystem, and in fact is in conflict with the ? optional syntax, which implies undefined as the value for "missing".

So here we are 5 years later... I don't know what tool was used or how these bindings were generated, but I would imagine the tooling has matured over this time and can better and more natively handle optional fields. It would be a great update to this library to properly support optional fields!

This update should ideally come as an intentional breaking change and major version update (or fork), as undefined is the proper way to represent missing data in the JavaScript ecosystem, and null does not even exist in Protocol Buffers, so its inclusion is redundant and unnecessarily complicating.

I'm happy to look into some of this myself, but I have no experience with generating PB bindings, so there are likely others more knowledgable and well suited for this task. (Although feel free to drop some suggestions here in comments)

tmdkamins commented 2 months ago

Following up after some initial research including this project's NodeJS README...

It appears this project uses a slightly older version of ProtoBuf.js (7.1.2 vs 7.4.0 latest). ProtoBuf is a very popular package (9.8k GH stars).

An alternative ts-proto (2.1k GH stars) includes these notes:

  • ts-proto transforms your .proto files into strongly-typed, idiomatic TypeScript files!
  • The 2.x release of ts-proto migrated the low-level Protobuf serializing that its encode and decode method use from the venerable, but aging & stagnant, protobufjs package
  • A poor man's attempt at "please give us back optional types" The canonical protobuf wrapper types, i.e. google.protobuf.StringValue, are mapped as optional values, i.e. string | undefined, which means for primitives we can kind of pretend the protobuf type system has optional types. (Update: ts-proto now also supports the proto3 optional keyword.)

Another alternative protobuf-ts (1k GH stars) was written to address deficiencies of ts-proto, but those have since been fixed in ts-proto itself.

This PR in this project here has been open for over a year, and there are even several new versions of ProtoBuf.js since, so even if ProtoBuf.js fixed how it handles optional fields (unlikely given same 7.* major version number), it seems there are no resources or will in this project to migrate to something newer on that line anyway, which is fine, but worth noting:

I expect the best path forward for developers who care about more natural TypeScript bindings (especially re default values) is to wrap the GTFS gtfs-realtime.proto file up in one of these other more modern TypeScript / Protocol Buffer bridges.

tmdkamins commented 2 months ago

Well I finished a deep dive, and... it ain't pretty. The entire concept of "optional" primitive fields in Protocol Buffers is riddled with complexity and problems, though it works reasonably well for optional messages/objects as well as optional wrapped types (eg google.protobuf.Int32Value). But GTFS-RT is using PB2 with optional primitives. PB3 actually removed the optional/required distinction entirely! (Although apparently they are planning to add it back in again). Also the inclusion of 0 by GTFS-RT (sometimes explicit, and sometimes forced by ancient decisions from GTFS-Static) as valid values in some of these enums only exacerbates everything.

I banged on protobuf-ts as well as ts-proto quite a bit. It's universally problematic, with assumptions of 0 and mixing up of 0, null, and undefined in generated code everywhere. And the workaround above, while functional, loses typing and incurs unnecessary overhead in extra object creation. So I'm sticking with this library, and I wrote a helpful utility class, which hopefully will be useful for any other TypeScript / PB devs stumbling on this page:

// ===========================================================================
// `PBUtil` Static Utility Class
// ===========================================================================

/**
 * Utility class for working with `protobuf.js` decoded messages.
 * 
 * This class primarily provides methods to handle issues related to 
 * optional fields, particularly in Protocol Buffers version 2 syntax, 
 * addressing the following issues:
 * 
 * - Distinguishing between primitive fields (`int32`, `string`, etc) that
 *   were not included in encoded message vs fields that were included
 *   but set to their default values (`0`, `""`, etc).
 * 
 * - Normalizing `null` to `undefined` for consistency and adherence to
 *   standards and expectations within the JavaScript ecosystem.
 * 
 * - Maintaining full TypeScript type safety on field names as well as
 *   value types.
 * 
 * @example
 * ```ts
 * const decodedMessage = protobufBinding.decode(wireMsg); // (eg decode)
 * for (const item of decodedMessage.some.things ?? []) {
 *     console.log(`Offer: ${PBUtil.get(item, 'offer') ?? "(N/A)"}`);
 *     if (PBUtil.has(item, 'rewardPoints')) { 
 *         balance += PBUtil.has(item, 'rewardPoints'); // (type preserved)
 *     }
 * }
 * ```
 */
export abstract class PBUtil
{
    /**
     * Did the encoded message that was decoded by `protobufjs` into `message`
     * actually have the field specified by `field` (and not just use the
     * field's default value)?
     * 
     * @param message - A decoded `protobufjs` message.
     * @param field - Name of the field on `message` to check.
     * @returns `true` if field was included in encoded message,
     *          `false` if not included.
     * 
     * @see get - Get the value of the field (or `undefined` if not included).
     */
    public static has<M extends object, F extends keyof M>(
        message: M,
        field:   F
    )
        : boolean
    {
        return message.hasOwnProperty(field);
    }

    /**
     * Retrieve the raw value of a field from a decoded `protobufjs` message,
     * returning `undefined` if the field was not actually included in the
     * encoded message.
     * 
     * @param message - A decoded `protobufjs` message.
     * @param field - Name of the field on `message` to retrieve.
     * @returns The raw value of the field if it was included in the encoded 
     *          message, or `undefined` if not included.
     * 
     * @see get - Like this, but with coercion of `null` to `undefined`.
     * @see has - Boolean test of whether the field was included.
     */
    public static getRaw<M extends object, F extends keyof M>(
        message: M,
        field:   F
    )
        : M[F] | undefined
    {
        return PBUtil.has(message, field) ? message[field] : undefined;
    }

    /**
     * Retrieve the value of a field from a decoded `protobufjs` message,
     * returning `undefined` if the field was not included in the encoded
     * message (or if its value was `null`).
     * 
     * @param message - A decoded `protobufjs` message.
     * @param field - Name of the field on `message` to retrieve.
     * @returns The value of the field if it was included in the encoded 
     *          message (and not `null`), or `undefined` otherwise.
     * 
     * @see getRaw - Like this, but without coercion of `null` to `undefined`.
     * @see has - Boolean test of whether the field was included.
     */
    public static get<M extends object, F extends keyof M>(
        message: M,
        field:   F
    )
        : Exclude<M[F], null> | undefined
    {
        const value = PBUtil.getRaw(message, field);
        return (value === null ? undefined : value) as Exclude<M[F], null> | undefined;
    }
}

For the original code at the top of this page, you would use it like this (in TS). You can see the value type (number) is preserved, we get rid of that pesky null, and your IDE will offer autocomplete and flag any typos on the field name ('speed'):

const feed = realtime.transit_realtime.FeedMessage.decode(protobuf);
feed.entity.forEach((entity) => {
  if (entity.vehicle) {
    // The speed will be either a number or `undefined` if not provided:
    const speed: number|undefined = PBUtil.get(entity.vehicle.position, 'speed');
  }
});