bluesky-social / atproto

Social networking technology created by Bluesky
Other
5.81k stars 418 forks source link

Record references in lexicon typed as unknown #999

Open Matrix89 opened 1 year ago

Matrix89 commented 1 year ago

According to feed/defs.json post view's record type is unknown. But the API calls which return post view, have the record $type set to app.bsky.feed.post.

Is this intentional or am I missing something about the implementation?

Couldn't postView#record be typed as { type: "recordRef", schema: "app.bsky.feed.post" }?

Edit: Just to add, having them typed would allow for better codegen

devinivy commented 1 year ago

I agree that it would be convenient to have concrete types for records embedded in responses like this. The issues we run into here are that a. technically users have the ability to publish invalid records in their repository, and b. there's the possibility that the schema for a record (for example) expands over time. It should do this in such a way that it remains both forward- and backward-compatible, but there are still lots of difficult cases that can arise or the schema author may not perfectly follow that rule.

So programs ingesting raw record data should make sure they understand the semantics of the records that they're consuming when choosing how to deal with them. The way we typically handle this is by validating the record contents based on our application's version of the schema for the record. At that point the application can start to use a more specific type that it's prepared to handle, or decide to use some fallback behavior if it doesn't understand the record data it received.

Matrix89 commented 1 year ago

technically users have the ability to publish invalid records in their repository

I don't think that's a problem, more of a consideration that should be kept in mind when implementing deserializers.

there's the possibility that the schema for a record (for example) expands over time. It should do this in such a way that it remains both forward- and backward-compatible, but there are still lots of difficult cases that can arise or the schema author may not perfectly follow that rule.

I see that, without record schema versioning it sounds impossible. I think It could be solved either by:

In summary, the issue can't be easily solved and all of the solutions I could think of have some drawbacks.

Now I have one more question, what is the meaning of unknown? It sounds like it could refer to any lexicon type not only records

devinivy commented 1 year ago

Yeah unknown can be thought of as the type Record<string, unknown> in typescript: it's guaranteed to be an object (i.e. not an array or primitive).

tom-sherman commented 1 year ago

Yeah unknown can be thought of as the type Record<string, unknown> in typescript: it's guaranteed to be an object (i.e. not an array or primitive).

This is not correct. unknown, along with any, is one of TypeScript's "top types" ie. it describes the set of all possible types. Therefore it's not safe to assume anything about the underlying value as it could be anything including primitives, arrays, or null.

If the semantics that the library is aiming for is "any object" then Record<string, unknown> is indeed the correct type.

dholms commented 1 year ago

hey @tom-sherman, it may not have come across clearly, but what @devinivy was saying was that unknown in lexicon can be thought of as the type Record<string, unknown> in typescript

However, you're correct in your comment here that I believe we're generated the wrong type there by generating {} (any non-nullish value) instead of Record<string, unknown>

dholms commented 1 year ago

important to note that unknown comes with one additional requirement that the contained object has a $type key on it

an unknown in lexicon is the same thing as an open union with no refs provided

tom-sherman commented 1 year ago

Yeah sorry about that, I think I got a little mixed up between those two threads!

unknown maybe can be best represented as {$type: string} & Record<string, unknown> then?

dholms commented 1 year ago

yup that sounds accurate 👍

str4d commented 3 weeks ago

@dholms wrote:

an unknown in lexicon is the same thing as an open union with no refs provided

This does match the specification for union (though the spec says "similar to", not "the same as".

important to note that unknown comes with one additional requirement that the contained object has a $type key on it

This does not match my reading of the specification for unknown (emphasis added):

Indicates than any data could appear at this location, with no specific validation. Note that the data must still be valid under the data model: it can't contain unsupported things like floats.

No type-specific fields.

as well as the specification for when to use $type, which states that $type is only required for record and union (emphasis added):

Data objects sometimes include a $type field which indicates their Lexicon type. The general principle is that this field needs to be included any time there could be ambiguity about the content type when validating data.

The specific rules are:

  • record objects must always include $type. While the type is often known from context (eg, the collection part of the path for records stored in a repository), record objects can also be passed around outside of repositories and need to be self-describing
  • union variants must always include $type, except at the top level of subscription messages

Note that blob objects always include $type, which allows generic processing.

As a reminder, main types must be referenced in $type fields as just the NSID, not including a #main suffix.

If it is the intention that unknown is always an object and always contains a $type key, it would be great to add this to the specification for unknown. It would also be useful to describe how unknowns should be generally handled, so that implementors don't presume they can automatically parse as records.

If instead the unknown spec is correct, that it can be any (data-model-valid) type, and if an object is not required to have a $type field, then the union documentation should be updated to clarify that a union with no refs is a subset of unknown in that it only supports objects and requires $type on them.

The reason I came across this is that I've found at least one Lexicon that violates @dholms's assertion: com.atproto.identity.getRecommendedDidCredentials. The verificationMethods and services fields of its response have type unknown, but do not include $type fields in their objects. I encountered this because currently atrium parses unknown fields as records (because many of them are), which currently forces $type to exist.

bnewbold commented 3 weeks ago

Unfortunately this is still an soft spot and under-specified. We have a tracking issue here: https://github.com/bluesky-social/atproto-website/issues/319

which references a discussion where other questions come up: https://github.com/bluesky-social/atproto/discussions/2615

The current "vibes" (not a spec!) are:

Something like a DID document, or fragment of a DID document, is basically exactly why we have unknown in Lexicon: an object with arbitrary fields which might be hard to spec out within the Lexicon schemas and evolution rules, and lack a $type field.

Maybe we should add an additional lexicon type for arbitrary record! which would have to be an object and would need to have $type. that pattern shows up in a couple places and would probably be helpful.