Open Matrix89 opened 1 year ago
I agree that it would be convenient to have concrete types for records embedded in responses like this. The issues we run into here are that a. technically users have the ability to publish invalid records in their repository, and b. there's the possibility that the schema for a record (for example) expands over time. It should do this in such a way that it remains both forward- and backward-compatible, but there are still lots of difficult cases that can arise or the schema author may not perfectly follow that rule.
So programs ingesting raw record data should make sure they understand the semantics of the records that they're consuming when choosing how to deal with them. The way we typically handle this is by validating the record contents based on our application's version of the schema for the record. At that point the application can start to use a more specific type that it's prepared to handle, or decide to use some fallback behavior if it doesn't understand the record data it received.
technically users have the ability to publish invalid records in their repository
I don't think that's a problem, more of a consideration that should be kept in mind when implementing deserializers.
there's the possibility that the schema for a record (for example) expands over time. It should do this in such a way that it remains both forward- and backward-compatible, but there are still lots of difficult cases that can arise or the schema author may not perfectly follow that rule.
I see that, without record schema versioning it sounds impossible. I think It could be solved either by:
In summary, the issue can't be easily solved and all of the solutions I could think of have some drawbacks.
Now I have one more question, what is the meaning of unknown
? It sounds like it could refer to any lexicon type not only records
Yeah unknown
can be thought of as the type Record<string, unknown>
in typescript: it's guaranteed to be an object (i.e. not an array or primitive).
Yeah
unknown
can be thought of as the typeRecord<string, unknown>
in typescript: it's guaranteed to be an object (i.e. not an array or primitive).
This is not correct. unknown
, along with any
, is one of TypeScript's "top types" ie. it describes the set of all possible types. Therefore it's not safe to assume anything about the underlying value as it could be anything including primitives, arrays, or null
.
If the semantics that the library is aiming for is "any object" then Record<string, unknown>
is indeed the correct type.
hey @tom-sherman, it may not have come across clearly, but what @devinivy was saying was that unknown
in lexicon can be thought of as the type Record<string, unknown>
in typescript
However, you're correct in your comment here that I believe we're generated the wrong type there by generating {}
(any non-nullish value) instead of Record<string, unknown>
important to note that unknown
comes with one additional requirement that the contained object has a $type
key on it
an unknown
in lexicon is the same thing as an open union with no refs provided
Yeah sorry about that, I think I got a little mixed up between those two threads!
unknown maybe can be best represented as {$type: string} & Record<string, unknown>
then?
yup that sounds accurate 👍
@dholms wrote:
an
unknown
in lexicon is the same thing as an open union with no refs provided
This does match the specification for union
(though the spec says "similar to", not "the same as".
important to note that
unknown
comes with one additional requirement that the contained object has a$type
key on it
This does not match my reading of the specification for unknown
(emphasis added):
Indicates than any data could appear at this location, with no specific validation. Note that the data must still be valid under the data model: it can't contain unsupported things like floats.
No type-specific fields.
as well as the specification for when to use $type
, which states that $type
is only required for record
and union
(emphasis added):
Data objects sometimes include a
$type
field which indicates their Lexicon type. The general principle is that this field needs to be included any time there could be ambiguity about the content type when validating data.The specific rules are:
record
objects must always include$type
. While the type is often known from context (eg, the collection part of the path for records stored in a repository), record objects can also be passed around outside of repositories and need to be self-describingunion
variants must always include$type
, except at the top level ofsubscription
messagesNote that
blob
objects always include$type
, which allows generic processing.As a reminder,
main
types must be referenced in$type
fields as just the NSID, not including a#main
suffix.
If it is the intention that unknown
is always an object and always contains a $type
key, it would be great to add this to the specification for unknown
. It would also be useful to describe how unknown
s should be generally handled, so that implementors don't presume they can automatically parse as record
s.
If instead the unknown
spec is correct, that it can be any (data-model-valid) type, and if an object is not required to have a $type
field, then the union
documentation should be updated to clarify that a union
with no refs
is a subset of unknown
in that it only supports objects and requires $type
on them.
The reason I came across this is that I've found at least one Lexicon that violates @dholms's assertion: com.atproto.identity.getRecommendedDidCredentials
. The verificationMethods
and services
fields of its response have type unknown
, but do not include $type
fields in their objects. I encountered this because currently atrium
parses unknown
fields as record
s (because many of them are), which currently forces $type
to exist.
Unfortunately this is still an soft spot and under-specified. We have a tracking issue here: https://github.com/bluesky-social/atproto-website/issues/319
which references a discussion where other questions come up: https://github.com/bluesky-social/atproto/discussions/2615
The current "vibes" (not a spec!) are:
unknown
data should probably be either null
or an object, not (eg) a string. not finalized though!unknown
objects don't need to have $type
Something like a DID document, or fragment of a DID document, is basically exactly why we have unknown
in Lexicon: an object with arbitrary fields which might be hard to spec out within the Lexicon schemas and evolution rules, and lack a $type
field.
Maybe we should add an additional lexicon type for arbitrary record! which would have to be an object and would need to have $type
. that pattern shows up in a couple places and would probably be helpful.
According to feed/defs.json post view's record type is
unknown
. But the API calls which return post view, have the record$type
set toapp.bsky.feed.post
.Is this intentional or am I missing something about the implementation?
Couldn't
postView#record
be typed as{ type: "recordRef", schema: "app.bsky.feed.post" }
?Edit: Just to add, having them typed would allow for better codegen