It’s too easy to use a schema or document with errors

SimonSapin commented 1 year ago

This is breaking change that we should either make before 1.0 leaves beta, or decide not to make.

Status quo

Typical intended usage of apollo-compiler 1.0.0-beta.4 looks like this:

let schema = apollo_compiler::Schema::parse(schema_input, "schema.graphql");
schema.validate()?;
let doc = apollo_compiler::ExecutableDocument::parse(&schema, input, "doc.graphql");
doc.validate(&schema);
do_something_with(&schema, &doc)

However it’s very easy to forget validation:

let schema = apollo_compiler::Schema::parse(schema_input, "schema.graphql");
let doc = apollo_compiler::ExecutableDocument::parse(&schema, input, "doc.graphql");
do_something_with(&schema, &doc)

… and get a schema and document that look fine and contain some of the expected things, but in fact may have error and silently be missing other things. Specifically, two categories of errors can be encountered before running "full validation":

Parse errors (invalid grammar or limit exceeded)
So-called "build errors" during conversion to a high-level data structure. For example: inserting a definition in a name-indexed map can fail if the name was already used.

In either case an error is recorded and returned by .validate(). But in the second usage example they’ll be ignored and may cause unexpected results later, when they’re harder to debug.

Full validation as well is a property that may be useful to encode in the Rust type system. Maybe even ExecutableDocument should only be created with a fully valid schema?

Proposal

Change parse methods to return a Result in order to force callers to handle the error case (even if with unwrap).

This result should not always be tied to that of full validation. Federation for example needs to parse subgraph schemas that are invalid per spec because their root Query type may be implicit. Federation code fills in the blanks by programmatically mutating a Schema before calling validation.

If each different state is a different Rust type we may need to pick names for up to 4 types just for schemas:

Schema::parse(…) -> Result<Schema, SchemaWithParseOrBuildError>
Schema::validate(self) -> Result<ValidSchema, InvalidSchema>

… and similarly for executable documents.

Unless parse runs full validation but only when it encounters a build or parse error? In which case both results could share a InvalidSchema error type.

SimonSapin commented 1 year ago

Sketch of a possible API:

/// Immutable, validated. Execution would require this.
///
/// FIXME: keep this name? "Document" is made implicit
struct Executable {…}

/// Mutable, may or may not be valid
///
/// FIXME: some other name?
struct PartialExecutable {…}

/// Immutable, validated.
/// 
/// `Executable` and `PartialExecutable` require this
struct Schema {…}

/// Mutable, may or may not be valid
///
/// FIXME: some other name?
struct PartialSchema {…}

struct WithErrors<T> {
    pub errors: DiagnosticList,

    /// Some components may be omitted when `errors` has a corresponding diagnostic
    pub document: T,
}

impl Schema {
    /// If `Err`, `DiagnosticList` may contain any kind of error
    pub fn parse_and_validate(
        source_text: impl Into<String>,
        path: impl AsRef<Path>,
    ) -> Result<Self, WithErrors<PartialSchema>> {…}
}

impl PartialSchema {
    /// If `Err`, `DiagnosticList` may contain parse errors or/and build errors
    pub fn parse(
        source_text: impl Into<String>,
        path: impl AsRef<Path>,
    ) -> Result<Self, WithErrors<Self>> {…}

    /// If `Err`, `DiagnosticList` only contains validation errors
    pub fn validate(self) -> Result<Schema, WithErrors<Self>> {…}
}

impl Executable {
    /// If `Err`, `DiagnosticList` may contain any kind of error
    pub fn parse_and_validate(
        schema: &Schema,
        source_text: impl Into<String>,
        path: impl AsRef<Path>,
    ) -> Result<Self, WithErrors<PartialExecutable>> {…}
}

impl PartialExecutable {
    /// If `Err`, `DiagnosticList` may contain parse errors or/and build errors
    pub fn parse(
        schema: &Schema,
        source_text: impl Into<String>,
        path: impl AsRef<Path>,
    ) -> Result<Self, WithErrors<Self>> {…}

    /// If `Err`, `DiagnosticList` only contains validation errors
    pub fn validate(self) -> Result<Executable, WithErrors<Self>> {…}
}

impl ast::Document {
    /// If `Err`, `DiagnosticList` only contains parse errors
    pub fn parse(
        source_text: impl Into<String>,
        path: impl AsRef<Path>,
    ) -> Result<Self, WithErrors<Self>> {…}

    // FIXME: names of these four methods?

    /// If `Err`, `DiagnosticList` only contains build errors
    pub fn to_partial_schema(&self) -> Result<PartialSchema, WithErrors<PartialSchema>> {…}

    /// If `Err`, `DiagnosticList` may contains build errors and/or validation errors
    pub fn to_schema_validate(&self) -> Result<Schema, WithErrors<PartialSchema>> {…}

    /// If `Err`, `DiagnosticList` only contains build errors
    pub fn to_partial_executable(
        &self,
        schema: &Schema,
    ) -> Result<PartialExecutable, WithErrors<PartialExecutable>> {…}

    /// If `Err`, `DiagnosticList` may contains build errors and/or validation errors
    pub fn to_executable_validate(
        &self,
        schema: &Schema,
    ) -> Result<Executable, WithErrors<PartialExecutable>> {…}    
}

SimonSapin commented 1 year ago

Also:

impl Schema {
    pub fn into_partial(self) -> PartialSchema {…}
}

impl Executable {
    pub fn into_partial(self) -> PartialExecutable {…}
}

qwerdenkerXD commented 12 months ago

Having all those different types is kinda confusing to me, why don't you do sth. like the following?

Schema::parse(...) -> Result<Self, DiagnosticList>  // parsing, including validation
Schema::parse_unchecked(...) -> Self  // only parsing, as is the current behaviour of parse(...)

and same thing for ExecutableDocument

SimonSapin commented 11 months ago

It’s possible, but would sacrifice encoding in the type system "this schema/document is valid" and "this function can only be called with a valid schema". Would things be easier to understand with a wrapper type?

/// Mutable, may or may not be valid
struct Executable {…}

/// Mutable, may or may not be valid
struct Schema {…}

/// Immutable (`Arc::make_mut` not exposed), can only be obtained from `validate`, implements Deref
struct Valid<T>(Arc<T>);

struct WithErrors<T> {
    pub errors: DiagnosticList,
    pub document: T,
}

impl Schema {
    fn parse_and_validate(self) -> Result<Valid<Self>, WithError<Self>>
    fn parse(…) -> Result<Self, WithError<Self>>
    fn validate(self) -> Result<Valid<Self>, WithError<Self>>
}

qwerdenkerXD commented 11 months ago

Yes, the Valid-wrapper is way more intuitive. How are validation warnings handled?

SimonSapin commented 11 months ago

Hmm good point! I kinda forgot about those :sweat_smile:

Like now, if validation finds both errors and warnings they’d end up in the same DiagnosticList in Result::Err. But for the only-warnings case they should be part of Result::Ok somehow. I’d rather not make it part of Valid<T>: after you either print, log, or ignore warnings they don’t need to be kept around while the document continues being used. What if Ok contained a tuple? Then by symmetry Err could too:

fn validate(self) -> Result<(Valid<Self>, DiagnosticList), (Self, DiagnosticList)>

… but part of the motivation for the WithError type was to make it implement Debug such that validate().unwrap() prints (only) diagnostics. Which would leave:

fn validate(self) -> Result<(Valid<Self>, DiagnosticList), WithErrors<Self>>

Or, equivalent but maybe feels less awkward, rename WithErrors to WithDiagnostics and:

fn validate(self) -> Result<WithDiagnostics<Valid<Self>>, WithDiagnostics<Self>>

SimonSapin commented 11 months ago

At this point I wonder: should warnings be part of validate in the first place? What if we declare that validate only returns validation errors, and there’s a separate method like fn lint(&self) -> DiagnosticList for warnings? Would anyone call it? If not, is it worth having non-error diagnostics at all in apollo-compiler when there’s only two?

goto-bus-stop commented 11 months ago

I think there is a good argument for splitting them up as you currently have to filter out warnings and "advice" in the most common use cases.

qwerdenkerXD commented 11 months ago

I agree with splitting them up. In the case of the mentioned WithDiagnostics-wrapper it would also be a little bit contradictory if you had an empty list.

qwerdenkerXD commented 11 months ago

Concerning the Valid-wrapper, I think that it doesn't really solve the issue as you're still able to easily forget validation when using the unvalidated parse function because you have access to all functions of the Schema or Executable. Maybe the first sketch of the API is really the better, as the partial/unvalidated variant enables modification only and evolves via successful validation.

SimonSapin commented 11 months ago

Sorry this was unclear, but the idea is that ExecutableDocument::parse would take a &Valid<Schema> parameter instead of &Schema. Similarly, execution and introspection APIs would take &Valid<ExecutableDocument>

qwerdenkerXD commented 11 months ago

ah then it's fine, didn't know you're working on an execution api, very cool

SimonSapin commented 11 months ago

By the way, the PR linked above contains a full execution engine but only to support introspection. The current plan is to keep it private at first, and later if someone is interested consider polishing it for other use cases. Some execution-related things like input variable coercion will likely be public though.

apollographql / apollo-rs

It’s too easy to use a schema or document with errors #709

Status quo

Proposal