facebookincubator / cinder

Cinder is Meta's internal performance-oriented production version of CPython.
https://trycinder.com
Other
3.42k stars 122 forks source link

questions about CheckedDict and unknown types #71

Open bennn opened 2 years ago

bennn commented 2 years ago

IIUC (based on issue 59), types like CheckedDict[str, Set[str]] where SP doesn't understand part of the type should:

  1. be allowed, and
  2. turn into CheckedDict[str, object]

But neither seems to work. (I'm running SP commit 1b01cd886.)

First, this type doesn't work except in a constructor call. Other type annotations give an error:

from __static__ import CheckedDict
from typing import Set

def f(dd: CheckedDict[int, Set[str]]):
  return

#     def f(dd: CheckedDict[int, Set[str]]):
# TypeError: expected type or Optional[T] for generic argument

Second, the constructor call gives a chkdict[str, dynamic] instead of replacing the unknown type with object:

from __static__ import CheckedDict
from typing import Set

d0: CheckedDict[str, dynamic] = CheckedDict[str, Set[str]]({})
# OK
d1: CheckedDict[str, object] = CheckedDict[str, Set[str]]({})
# compiler.errors.TypedSyntaxError: type mismatch: chkdict[str, dynamic] cannot be assigned to chkdict[str, object]

Are these problems?

carljm commented 2 years ago

Hi Ben,

At first glance these do look like bugs. CheckedDict with dynamic key or value should translate to object at runtime (though not in the compiler; object and dynamic are quite different there in terms of what we'll allow you to do with values pulled out.) I'll hopefully get a fix up for this soon. Thanks as always for the report!

carljm commented 2 years ago

Ok, this turns out to be kind of a nasty issue. The expected type or Optional[T] error comes from the runtime construction of a CheckedDict, and at runtime it is getting a non-type (Set[str], which is a typing._GenericAlias at runtime) that the classloader doesn't understand. This error will happen whenever a CheckedDict type with a generic type we don't understand is used in an annotation (without from __future__ import annotations which causes annotations to not be executed at runtime) or in a constructor call in nonstatic code. In the similar case where a nonstatic type is used as key or value type in a CheckedDict, this runtime error won't occur, but the CheckedDict constructed in the runtime constructor call will actually use the real nonstatic type as type parameter instead of object, and won't match what is expected by static code.

This is a hard problem to resolve, particularly for the nonstatic code case, because there's no way nonstatic code at runtime can know which "types" are resolved to dynamic by the static compiler.

We have two ideas of broad directions for fixing this. The ideal fix would require the runtime classloader and the static compiler to share a source of truth for which types are known by the static compiler and which are resolved to dynamic (and thus to object at runtime). This will be a bit annoying, and a big chunk of C work, but it should be doable. Then a runtime creation of a CheckedDict type can consult this shared source of truth to ensure it constructs the same type of CheckedDict the static compiler will expect.

A more partial "fix" (that still wouldn't bring great UX) could look something like this:

First we fix the static compiler to actually emit LOAD_CONST object instead of the bytecode for loading Set[str] or whatever, for any type argument of a CheckedDict that resolves to dynamic type. This will fix the error for the "CheckedDict annotation without from __future__ import annotations" case, and will also allow defining type aliases like MyDict = CheckedDict[int, Set[str]] in a static module, where at runtime MyDict is actually a CheckedDict[int, object].

Second, we require creation of checked dicts in nonstatic code to use a type alias like that one, imported from a static module. This will ensure the static compiler is always in charge of creating CheckedDict types, and can ensure they are consistent. The exact method for disallowing nonstatic modules from creating new CheckedDict generic types remains TBD: could use compiler tricks to ensure that from __static__ import CheckedDict only works in static code, or...

The shared source of truth option is definitely better if we can make it work.

Since this is a UX issue, not a soundness issue, and it's not a UX issue that affects low-touch conversions of non-static code, we likely aren't going to prioritize addressing it immediately.

carljm commented 2 years ago

One other note: the second part of the issue (d1: CheckedDict[str, object] = CheckedDict[str, Set[str]]({})) I'm not sure we'd consider a bug. In the compiler, dynamic and object are quite different (since an object will allow very limited methods, whereas dynamic will allow anything), and in general CheckedDict are invariant.

I think in this specific case we could technically relax the invariance and allow the assignment, since both object and dynamic cover the same set of possible runtime values (anything but primitives) so there is no risk of unsound mutation. But we'd probably need to encounter a strong motivating use case to bother.

bennn commented 2 years ago

For the first part, does the runtime need to throw an error for expected type or Optional[T] when making a CheckedDict? I was thinking it should default to object if that check fails.

(That definitely won't fix the issue with nonstatic types getting used as keys/values, but I thought it'd be enough for Set[str].)

carljm commented 2 years ago

does the runtime need to throw an error

I guess that's a judgment call, but I think it would be kind of confusing to allow just anything at all (e.g. CheckedDict["foo", 1]) and resolve it to object, so I do think the runtime should validate these type args.

It's a good point though that short of fully fixing the bug, we could easily whack-a-mole a few specific common cases by explicitly resolving them to object for a noticeable practical improvement.