gcanti / io-ts

Runtime type system for IO decoding/encoding
https://gcanti.github.io/io-ts/
MIT License
6.71k stars 329 forks source link

non-enumerable records with extra keys do not pass `io-ts.record.is`, contrary to TypeScript types #704

Closed tgfisher4 closed 12 months ago

tgfisher4 commented 1 year ago

🐛 Bug report

Current Behavior

io-ts.record type does not agree with the TypeScript's Record type, which AFAICT is its express goal. The discrepancy arises specifically for "non-enumerable" records, as they are called in the io-ts source. In the following example (source code is also copy-pasted under Reproducible example) — https://stackblitz.com/edit/node-tqqi5y?file=index.ts — this behavior is unexpected:

NonEnumerableRecord.is(hasExtraNonEnumerable)
> false
E.isRight(NonEnumerableRecord.decode(hasExtraNonEnumerable))
> false

Expected behavior

In the above example, I expect

NonEnumerableRecord.is(hasExtraNonEnumerable)
> true
E.isRight(NonEnumerableRecord.decode(hasExtraNonEnumerable))
> true

since hasExtraNonEnumerable is assignable to NonEnumerableRecord at the type-level.

The latter expectation assumes that it is a general principle that .decode should succeed whenever .is does: is this correct?

Reproducible example

https://stackblitz.com/edit/node-tqqi5y?file=index.ts

import * as t from 'io-ts';
import * as E from 'fp-ts/Either';

// io-ts and TypeScript agree that enumerable records can have extra keys
const EnumerableRecord = t.record(
  t.keyof({ a: null, b: null }),
  t.literal('value')
);
type EnumerableRecord = t.TypeOf<typeof EnumerableRecord>;
type EnumerableWithExtra = { a: 'value'; b: 'value'; c: 'value' };
const hasExtraEnumerable = { a: 'value', b: 'value', c: 'value' } as const satisfies EnumerableWithExtra;
// Typescript considers an EnumerableWithExtra to be a valid EnumerableRecord
const extraExtendsEnumerable: EnumerableWithExtra extends EnumerableRecord
  ? true
  : false = true;
const enumerable: EnumerableRecord = hasExtraEnumerable;
// so does io-ts
console.log(
  `${JSON.stringify(hasExtraEnumerable)} ${
    EnumerableRecord.is(hasExtraEnumerable) ? 'is' : 'is not'
  } ${EnumerableRecord.name}`
);
console.log(
  `${JSON.stringify(hasExtraEnumerable)} ${
    E.isRight(EnumerableRecord.decode(hasExtraEnumerable)) ? 'is' : 'is not'
  } ${EnumerableRecord.name}`
);

// TypeScript allows non-enumerable records to have extra keys, too, but io-ts does not
const Prefixed = t.refinement( // I use refinement to avoid needing to decode all my string literal keys below
  t.string,
  (s): s is `prefix:${string}` => s.startsWith('prefix:'),
  '`prefix:${string}`'
);
const NonEnumerableRecord = t.record(Prefixed, t.literal('value'));
type NonEnumerableRecord = t.TypeOf<typeof NonEnumerableRecord>;
type NonEnumerableWithExtra = { 'prefix:foo': 'value'; bar: 'value' };
const hasExtraNonEnumerable = { 'prefix:foo': 'value', bar: 'value' } as const satisfies NonEnumerableWithExtra;
// Typescript considers NonEnumerableWithExtra to be a valid NonEnumerableRecord
// Indeed, this is consistent with the enumerable record behavior seen above
const extraExtendsNonEnumerable: NonEnumerableWithExtra extends NonEnumerableRecord
  ? true
  : false = true;
const nonEnumerable: NonEnumerableRecord = hasExtraNonEnumerable;
// However, io-ts does _not_ consider a NonEnumerableWithExtra to be a valid NonEnumerableRecord
console.log(
  `${JSON.stringify(hasExtraNonEnumerable)} ${
    NonEnumerableRecord.is(hasExtraNonEnumerable) ? 'is' : 'is not'
  } ${NonEnumerableRecord.name}`
);
console.log(
  `${JSON.stringify(hasExtraNonEnumerable)} ${
    E.isRight(NonEnumerableRecord.decode(hasExtraNonEnumerable))
      ? 'is'
      : 'is not'
  } ${NonEnumerableRecord.name}`
);
// control: io-ts accepts this record without the extra key
const noExtraNonEnumerable = { 'prefix:foo': 'value'};
console.log(
  `${JSON.stringify(noExtraNonEnumerable)} ${
    NonEnumerableRecord.is(noExtraNonEnumerable) ? 'is' : 'is not'
  } ${NonEnumerableRecord.name}`
);

Suggested solution(s)

This code — https://github.com/gcanti/io-ts/blob/master/src/index.ts#L401 — should check only that any domain key k of the unknown record u which satisfies domain.is(k) also satisfies codomain.is(u[k]), rather than requiring this for every key:

return Object.keys(u).every((k) => !domain.is(k) || codomain.is(u[k]))

Happy to make a PR if this is indeed the correct behavior.

Additional context

Your environment

Which versions of io-ts are affected by this issue? Did this work in previous versions of io-ts?

Exact versions are also available in the linked stackblitz environment. Looking through the blame on this code, I don't think this example has ever respected TypeScript types, as I've demonstrated them.

Software Version(s)
io-ts 2.2.20
fp-ts 2.16.1
TypeScript 5.3.2
gcanti commented 1 year ago

@tgfisher4 yeah, this is definitely a bug, both is and decode must be fixed

tgfisher4 commented 1 year ago

@gcanti cool, I'm working on a PR to address this and want to discuss the ideal nonEnumerableRecord.{en,de}code behavior.

Currently, enumerableRecord.decode acts like exact.decode in that it strips keys outside the domain. Personally, I find this unexpected since

  1. stripping keys outside the domain is not necessary to ensure the decodeing satisfies the corresponding Record type, since an object with keys outside the domain still satisfies the Record type, as demonstrated in my original post,
  2. type.decode preserves keys outside the props, and record is similar in that it directly corresponds to a representable TypeScript type (vs exact, whose distinction from type is not representable at the TypeScript type-level), and
  3. enumerableRecord.encode preserves keys outside the domain iff codomain.encode === identity (since, then, enumerableRecord.encode === identity).

In general, I'd expect the same philosophy as type — do the minimum work needed to ensure the {en,de}codeings match the expected types, i.e.:

So, should nonEnumerableRecord.{en,de}code strip extraneous keys for consistency with this (IMO unexpected) behavior of enumerableRecord.{en,de}code, or perhaps should these both ignore extraneous keys for consistency with type?

In case it wasn't clear, I'd prefer the latter, perhaps with the addition of supporting exact(record(...)) to strip keys outside the domain when {en,de}codeing to emulate the current behavior of enumerableRecord and exact(type(...)). What do you think?

gcanti commented 1 year ago

I wouldn't alter the current behavior except in response to the bug you identified, this is because the default of removing unexpected keys while decoding is sensible (the behavior of type was initially a mistake later corrected by strict/exact)

tgfisher4 commented 1 year ago

@gcanti I see, so the general philosophy is instead that a decodeing should match A as closely as possible, with type being an historical exception. That clarifies the decode behavior, thanks!

Does the same apply for encode, i.e., an encodeing should match O as closely as possible, in particular by stripping keys outside the domain for record types? Currently, the behavior is inconsistent in that enumerableRecord.encode passes through extraneous keys iff codomain.encode === identity, since in this case enumerableRecord.encode === identity, while nonEnumerableRecord.encode always passes through extraneous keys (while, perhaps interestingly, still passing them through domain.encode if !(domain.encode === identity && codomain.encode === identity)).

gcanti commented 1 year ago

Regarding encoding and those checks on identity, those are optimization attempts, but I think they should be removed if they compromise correctness

tgfisher4 commented 1 year ago

Gotcha, that's what I figured, thanks @gcanti.

705 addresses this issue and is ready for review.

alecgibson commented 12 months ago

👋🏼 @gcanti @tgfisher4

We were apparently relying on this (broken?) behaviour. Here's a StackBlitz.

Given this module:

const isUUID = (u) => typeof u === 'string' && uuid.validate(u);
const UUIDType = new t.Type(
  'uuid',
  isUUID,
  (u, c) => (isUUID(u) ? t.success(u) : t.failure(u, c)),
  t.identity
);

const ThingsByUUID = t.record(UUIDType, t.boolean);

...we then write these tests:

describe('ThingsByUUID', () => {
  it('allows UUIDs as keys', () => {
    const obj = {
      [uuid.v4()]: true,
    };

    expect(ThingsByUUID.is(obj)).to.be.true;
  });

  it('disallows non-UUIDs as keys', () => {
    const obj = {
      foo: true,
    };

    expect(ThingsByUUID.is(obj)).to.be.false;
  });
});

This passes in io-ts@=2.2.20, but fails in io-ts@=2.2.21.

I understand the argument for this change, but wanted to flag that it has broken some (old) code, and also wanted to ask if there's a built-in way to achieve this without rolling a whole new Type definition class?

alecgibson commented 12 months ago

Actually a case where this change has actively broken similarity between TypeScript and io-ts is in another type we use, which makes use of template literal types:

export function stringValidatorType(name: string, predicate: (u: string) => boolean): Type<string> {
  const is = (u: unknown): u is string => typeof u === 'string' && predicate(u);
  return new Type<string>(
    name,
    is,
    (u, c) => is(u) ? success(u) : failure(u, c),
    identity,
  );
}

export function prefixedStringType<T extends string>(prefix: T): Type<`${T}${string}`> {
  return stringValidatorType('PrefixedString', (u) => u.startsWith(prefix)) as Type<`${T}${string}`>;
};

Then a failing test case, that should actually pass:

it('builds a record', () => {
  const PrefixRecord = t.record(t.prefixedString('foo'), t.boolean);
  type IPrefixedRecord = t.TypeOf<typeof PrefixRecord>;

  const goodRecord: IPrefixedRecord = {
    foo_a: true,
  };
  expect(PrefixRecord.is(goodRecord)).to.be.true;

  const badRecord: IPrefixedRecord = {
    // @ts-expect-error :: TypeScript doesn't allow this bad prefix
    bar_a: true,
  };
  expect(PrefixRecord.is(badRecord)).to.be.false; // This assertion fails: io-ts disagrees with TypeScript
});
gcanti commented 12 months ago

@alecgibson that is "excess property checking" in action, it doesn't mean that TypeScript doesn't allow other props in general:

const badRecord = {
  bar_a: true
}

const x: IPrefixedRecord = badRecord // no error
alecgibson commented 12 months ago

🤯 Okay TypeScript continues to surprise me every day!

In any case, is there a recommended out-of-the-box way to achieve what I was previously doing (incorrectly) with t.record? Or I need to build a custom codec?

tgfisher4 commented 12 months ago

@alecgibson not sure about @gcanti's general philosophy on breaking changes, but you can emulate the previous behavior of this check in a one-off function using decode:

const isExactlyThingsByUUID = (u: unknown) =>
  pipe(
    u,
    ThingsByUUID.decode,
    E.fold(
      (_) => false,
      (r) => r === u
    )
  );

which yields

const good = { [uuid.v4()]: true };
const bad1 = { foo: true, bar: true, [uuid.v4()]: true };
const bad2 = { foo: true, bar: true, [uuid.v4()]: 'string' };

console.log(
  `${JSON.stringify(good)} ${
    isExactlyThingsByUUID(good) ? 'is' : 'is not'
  } exactly ${ThingsByUUID.name}`
);
// > {"2117c213-ab40-409a-be3c-78d0571b0d3d":true} is exactly { [K in uuid]: boolean }
console.log(
  `${JSON.stringify(bad1)} ${
    isExactlyThingsByUUID(bad1) ? 'is' : 'is not'
  } exactly ${ThingsByUUID.name}`
);
// > {"foo":true,"bar":true,"25b4ab4b-f3ef-4e3c-b29a-3e3b8ae4332d":true} is not exactly { [K in uuid]: boolean }
console.log(
  `${JSON.stringify(bad2)} ${
    isExactlyThingsByUUID(bad2) ? 'is' : 'is not'
  } exactly ${ThingsByUUID.name}`
);
// > {"foo":true,"bar":true,"c8f50acc-78cd-4ea6-b3bd-224a420e4607":"string"} is not exactly { [K in uuid]: boolean }

This works because io-ts returns the same reference when it's able, so if (a) no keys outside the domain are stripped, and (b) the record domain and codomain do not transform values during validate, the input and output objects should really be strictly equal. This works for this case since domain.validate and codomain.validate do not transform their values, but is not generally applicable.

Otherwise, I think you'll have to roll your own codec, here's my shot at a draft:

function excessRejectingRecord<D extends t.Mixed, C extends t.Mixed>(
  domain: D,
  codomain: C,
  name?: string
): t.RecordC<D, C> {
  const baseCodec = t.record(domain, codomain);
  return new t.DictionaryType(
    name ?? `Exact<${baseCodec.name}>`,
    (u): u is t.TypeOf<typeof baseCodec> =>
      pipe(
        u,
        baseCodec.decode,
        E.fold(
          (_) => false,
          (decoded) =>
            t.UnknownRecord.is(u) &&
            Object.keys(u).every((key) => key in decoded)
        )
      ),
    (u, c) =>
      pipe(
        t.UnknownRecord.validate(u, c),
        E.chain(
          flow(
            Object.keys,
            Arr.filter(not(domain.is)),
            // can play with the ValidationError/context here to adjust error messages
            Arr.map((key) => t.failure(key, t.appendContext(c, key, domain))),
            Arr.filterMap(flow(E.swap, O.fromEither)), // it is silly that t.failure returns t.Validation rather than Left<Errors>
            Arr.flatten,
            NEArr.fromArray,
            // there's probably a cleaner way to combine these errors,
            // e.g. by mapping the Option to an Either and constructing a validation-like semigroup that
            // returns the concatenation of all errors, or if none are preset, the first Right,
            // but I couldn't figure out a nice built-in way to generate such a semigroup
            // (Alt short-circuits on Right, but I only want to return Right if _both_ are Right)
            O.match(
              () => baseCodec.validate(u, c),
              (excessKeyErrors) =>
                pipe(
                  baseCodec.validate(u, c),
                  E.match(
                    (baseCodecErrors) =>
                      E.left(
                        Arr.getSemigroup<t.ValidationError>().concat(
                          baseCodecErrors,
                          excessKeyErrors
                        )
                      ),
                    () => E.left(excessKeyErrors)
                  )
                )
            )
          )
        )
      ),
    baseCodec.encode,
    domain,
    codomain
  );
}

which produces

const excessRejectingThingsByUUID = excessRejectingRecord(UUIDType, t.boolean);
console.log(
  `${JSON.stringify(good)} ${
    excessRejectingThingsByUUID.is(good) ? 'is' : 'is not'
  } ${excessRejectingThingsByUUID.name}`
);
// > {"18c71af5-0f72-4577-a60d-3138f093ba71":true} is Exact<{ [K in uuid]: boolean }>
console.log(
  `${JSON.stringify(bad1)} ${
    excessRejectingThingsByUUID.is(bad1) ? 'is' : 'is not'
  } ${excessRejectingThingsByUUID.name}`
);
// > {"foo":true,"bar":true,"0454c99d-97be-498f-b16c-3c9bcfc90fb2":true} is not Exact<{ [K in uuid]: boolean }>
console.log(
  `${excessRejectingThingsByUUID.name}.decode(${JSON.stringify(
    bad1
  )}) = ${PathReporter.report(excessRejectingThingsByUUID.decode(bad1))}`
);
// > Exact<{ [K in uuid]: boolean }>.decode({"foo":true,"bar":true,"0454c99d-97be-498f-b16c-3c9bcfc90fb2":true}) = Invalid value "foo" supplied to : Exact<{ [K in uuid]: boolean }>/foo: uuid,Invalid value "bar" supplied to : Exact<{ [K in uuid]: boolean }>/bar: uuid
console.log(
  `${JSON.stringify(bad2)} ${
    excessRejectingThingsByUUID.is(bad2) ? 'is' : 'is not'
  } ${excessRejectingThingsByUUID.name}`
);
// > {"foo":true,"bar":true,"c8f50acc-78cd-4ea6-b3bd-224a420e4607":"string"} is not Exact<{ [K in uuid]: boolean }>
console.log(
  `${excessRejectingThingsByUUID.name}.decode(${JSON.stringify(
    bad2
  )}) = ${PathReporter.report(excessRejectingThingsByUUID.decode(bad2))}`
);
// > Exact<{ [K in uuid]: boolean }>.decode({"foo":true,"bar":true,"c8f50acc-78cd-4ea6-b3bd-224a420e4607":"string"}) = Invalid value "string" supplied to : Exact<{ [K in uuid]: boolean }>/c8f50acc-78cd-4ea6-b3bd-224a420e4607: boolean,Invalid value "foo" supplied to : Exact<{ [K in uuid]: boolean }>/foo: uuid,Invalid value "bar" supplied to : Exact<{ [K in uuid]: boolean }>/bar: uuid

There are probably some opportunities to clean up/optimize that, but that's my idea.

This is all available in this this StackBlitz.

As @gcanti alluded to, it's worth noting that this guarantee (that a record has no keys outside its domain) is stronger than what TypeScript's structural type system can provide/describe.

alecgibson commented 12 months ago

@tgfisher4 thanks for the very thorough reply! 🙏🏼

Since yesterday I did a bit more rummaging in the codebase and discovered t.refinement, which I really should have been using instead of our own hand-rolled wrapper.

It also got me to thinking that this behaviour really is sort of like a "record refinement" (if I understand the concept of "refinement" correctly: something that validates in a stricter way than TS might allow for? Which is what this whole conversation is about).

So I wonder if the most semantic, simple solution is something like:

export function refinedRecordType<D extends t.RefinementC<t.StringC>, C extends t.Mixed>(
  domain: D,
  codomain: C,
  name?: string,
): t.RefinementC<t.RecordC<D, C>> {
  return t.refinement(
    // Deliberately use t.string instead of domain to avoid io-ts stripping non-domain keys
    // before our predicate is run
    t.record(t.string, codomain, name),
    (a) => Object.keys(a).every(domain.is),
  ) as any;
}

It's a little bit ugly (it has some casting), and the errors are awful (since we're just relying on true/false logic), but I think it seems to do the job whilst staying vaguely understandable.