Refactor errors to include position, span (offset + len), and optional miette integration

chanced commented 1 week ago

This pull request:

Adds Span structure which contains len & offset
Adds Spanned error wrapper struct that contains a Span and a source error. It implements miette::Diagnostic by acting as a pass-through for all methods except labels (used to mark spans). For labels, it provides a blank (None) label with the span
Adds Positioned error wrapper struct that contains a Spanned error and a position. It also implements miette::Diagnostic by deferring to Spanned
Changes all errors to tuple variants, using either Positioned or Spanned, depending on context.

I'm just getting started with it so it will not compile right now. I should hopefully have it done today.

chanced commented 1 week ago

@asmello incase you happen to have a minute - can you please spot this and give me a bit of feedback on how you feel about both the degree of breaking change this is going to introduce as well as the general design? I only spot checked one crate which uses miette and it was kinda convoluted. I think this approach is fairly clean but it does break pretty much all error handling for consumers.

Here's a short summary, told through types:

pub struct Span {
    pub offset: usize,
    pub len: usize,
}

pub struct Spanned<E> {
    span: Span,
    source: E,
}

#[cfg(feature = "miette")]
impl<E> miette::Diagnostic for Spanned<E>
where
    E: 'static + miette::Diagnostic,
{
    fn labels(&self) -> Option<Box<dyn Iterator<Item = miette::LabeledSpan> + '_>> {
        Some(Box::new(once(miette::LabeledSpan::new(
            None,      // no label - this would require an allocation (String) + doesnt seem like we can add much
            self.offset(), // self.span.offset
            self.len(),    // self.span.len
        ))))
    }
  // all other methods are passthrough to `self::source`
}

// implements miette::Diagnostic as passthrough to Spanned<E>
pub struct Positioned<E> {
    position: usize,
    source: Spanned<E>,
}

usage is:

pub enum Error {
    FailedToParseIndex(Positioned<ParseIndexError>),
    OutOfBounds(Positioned<OutOfBoundsError>),
}

asmello commented 1 week ago

I haven't used miette like this (my uses have been through the dynamic macros or thiserror integration), but will give a look.

chanced commented 1 week ago

I will need to use the miette's Diagnostic derive macro on the errors themselves (e.g. ParseIndexError)

asmello commented 1 week ago

Oh I think I understand, each Error will implement Diagnostic on its own, but we'll leave the labels to a wrapper since that part is common to most (all?) of them. I think this makes sense. We may want some wrapper for the source code part too, for a similar benefit. After all the labels are pointless without the source code.

I think it'll be worth playing around with a few errors and seeing how it goes. Biggest risk I see is this ends up being equivalent work to just implementing these directly for each error type.

chanced commented 1 week ago

Oh, right! I forgot about source!

So the reason I opted for the wrappers is that most of the errors themselves do not have knowledge of their surroundings. I can change that though with wrappers specific to each.

/// Indicates that an `Index` is not within the given bounds.
#[derive(Debug, PartialEq, Eq)]
pub struct OutOfBoundsError {
    pub length: usize,
    pub index: usize,
}

to

#[derive(Debug, PartialEq, Eq)]
pub struct OutOfBoundsErrorWrapper {
    pub source: OutOfBoundsError,
    pub position: usize,
    pub span: Span,
}

or something like it

chanced commented 1 week ago

Or I guess I could go the other route and refactor the error into a struct:

// assign::Error
pub struct Error {
     kind: Kind,
     position: usize,
     span: Span,
}

pub enum Kind {
    FailedToParseIndex(ParseIndexError),
    OutOfBounds(OutOfBoundsError),
}

chanced commented 1 week ago

I'm going to flush this design out to see how it works - I haven't figured out what to do about source yet (the convoluted nature of pest's errors now make a lot more sense) so it may be all for not.

chanced commented 1 week ago

I have an idea for how to do this - just running into borrowck issues. I'm hoping the ergonomics are going to shake out but trait resolution may thwart my plans.

The way it is anticipated to work is:

let result = PointerBuf::parse_with_report("/example");
let err_report: Report<'a, ReplaceError, Pointer> = buf
    .report_err()
    .replace(1, "oops")
    .unwrap_err();

src/error.rs

// crate::error::Reporter
pub trait Reporter<T> {
    type Reporter<'e>
    where
        Self: 'e;

    // TODO: naming, alts: report, include_err_report, with_err_report, ???
    fn report_err(&'_ self) -> Self::Reporter<'_>;
}

// crate::error::MutReporter
pub trait MutReporter<T> {
    type Reporter<'e>
    where
        Self: 'e;

    // TODO: naming, alts: report, include_err_report, with_err_report, ???
    fn report_err(&'_ mut self) -> Self::Reporter<'_>;
}

src/pointer.rs

pub struct MutReporter<'p> {
    ptr: &'p mut PointerBuf,
}

impl<'p> MutReporter<'p> {
    pub fn replace<'t>(
        mut self,
        index: usize,
        token: impl Into<Token<'t>>,
    ) -> Result<Option<Token<'t>>, Report<'p, ReplaceError, Pointer>>
    where
        'p: 't,
    {
        // borrowck: can't reborrow - need to see if i can use polonius-the-crab
        match self.ptr.replace(index, token) {
            Ok(res) => Ok(res),
            Err(source) => Err(Report {
                source,
                source_code: Cow::Borrowed(&*self.ptr),
            }),
        }
    }
    // rest of methods which can mutate a pointer
}

pub struct Reporter<'p> {
    ptr: &'p Pointer,
}
// impl methods such as `assign`, `resolve`, etc

impl error::MutReporter<PointerBuf> for PointerBuf {
    fn report_err(&'_ mut self) -> Self::Reporter<'_> {
        MutReporter { ptr: self }
    }
}

For parsing, there would need to be a ParseReporter for PointerBuf and Pointer (String and &str respectively). I'm going to see if i can use polonius-the-crab to solve the issues I'm running into.

I'm trying to avoid having the user re-supply the SourceCode as I think that would be potentially error prone and weird ergonomics. At the same time, I want errors with lifetimes to be opt-in.

We don't need all of the refactoring to errors - this can be done keeping the errors with struct variants. We'd just need to add Span to most variants. Refactoring, while it introduces a good bit of churn, may still be worth it as the final iteration on errors.

edit: not sure why I had ohmyfish on my mind

asmello commented 1 week ago

I don't follow what role Reporter is playing here, but generally errors are intended to have static lifetimes, because they're expected to outlive their sources. This is why Error::source has 'static in the return signature. So I think allocating-on-error is fine, actually. But I'm probably misunderstanding what you're trying to achieve.

chanced commented 1 week ago

Yea, I should have explained that better. Sorry.

The reason Reporter is a trait is that I intend to implement it for serde_json::Value and toml::Value. All it does is provide a means to create a wrapper around a &str, String, &Pointer, &mut PointerBuf, &Value, or &mut Value.

Each Reporter wrapper re-exposes methods it is capable of handling which then maps the error side of a Result into Report, using the &Pointer, &str, or String as the source_code and the error as source.

pub struct Report<'a, E, T: ToOwned + ?Sized> {
    pub source: E, // the error
    pub source_code: Cow<'a, T>, // str or Pointer
}

re: allocation - do you mean all errors could/should allocate or that Cow in the above is unnecessary and all reports should be owned?

asmello commented 1 week ago

do you mean all errors could/should allocate

Oh hell no, that would be an awful constraint. But they likely should have static lifetimes. So, for example, an error containing a static string is fine. In other cases the input/source can be owned. Allocate is a sort of last resort, but what I'm saying is that borrowing probably shouldn't even be attempted.

This doesn't preclude the usage of Cow, but I imagine we'd need Cow<'static, T> there (not unheard of). Unsure if it's worth it, but could be.

asmello commented 1 week ago

As for the rest, thanks for the clarification. This sort of wrapping:

pub struct Report<'a, E, T: ToOwned + ?Sized> {
    pub source: E, // the error
    pub source_code: Cow<'a, T>, // str or Pointer
}

makes sense to me, although not sure how generalisable it is. Another thing to experiment with, I suppose.

chanced commented 1 week ago

I'm game for dumping the Cow and making it an owned String or PointerBuf. I think it'll simplify borrowing constraints.

chanced commented 1 week ago

@asmello i dont think this is going to work :(

I have a very little playground here where 2 flavors (Mutable and Immutable) are supposed to be in play but it doesn't seem to work how I want.

I was really hoping resolution wouldn't matter - that calling report_err() on multiple implementations would return which ever you ended up using (based on the methods used). That does not seem to be the case.

I'll toy around with it a bit more but I'm out of ideas if this doesn't end up working. I guess we just let them create the report by passing the error and String, Pointer, or PointerBuf.

use jsonptr::{error::Report, resolve, PointerBuf, ReplaceError};

fn mutable() {
    use jsonptr::error::{ReportErr, ReportErrMut};
    let mut ptr = PointerBuf::parse("/example").unwrap();
    let err: Report<ReplaceError> = ptr.report_err().replace(2, "invalid").unwrap_err();
    // 👍
}

fn immutable() {
    use jsonptr::error::{ReportErr, ReportErrMut};
    let ptr = PointerBuf::parse("/example").unwrap();
    let err: Report<resolve::Error> = ptr
        .report_err()
        .resolve(&serde_json::Value::Null)
        .unwrap_err();
    // 👍
}

fn both() {
    use jsonptr::error::{ReportErr, ReportErrMut};

    let mut ptr = PointerBuf::parse("/example").unwrap();
    let err: Report<resolve::Error> = ptr
        .report_err()
        .resolve(&serde_json::Value::Null)
        .unwrap_err();
    // 👍

    let err: Report<ReplaceError> = ptr.report_err().replace(2, "invalid").unwrap_err();
    // ❌ no method named `replace` found for struct `jsonptr::reporter::Immutable` in the current scope
}

https://github.com/chanced/spike-jsonptr-reports

`Mutable`:

impl<'p> Mutable<'p, ReplaceError> {
    pub fn replace<'t>(
        self,
        index: usize,
        token: impl Into<Token<'t>>,
    ) -> Result<Option<Token<'t>>, Report<ReplaceError>>
    where
        'p: 't, { }
}

`Immutable`:

impl<'p> Immutable<'p, resolve::Error> {
    pub fn resolve<R: crate::Resolve<Error = resolve::Error>>(
        self,
        value: &'_ R,
    ) -> Result<&'_ R::Value, Report<resolve::Error>> {}
}

chanced commented 1 week ago

I don't think this is worth it. I think we just provide a Report type that has a constructor that you pass an error and a PointerBuf and a string. The errors themselves contain the span.

chanced commented 3 days ago

@asmello I refactored the errors, although I haven't gotten to the Report yet. I think it'll be pretty simple though.

This includes breaking changes that I have commented out for ParseError. Right now, I have complete_offset, source_offset, and pointer_offset commented out. I don't think we need them or the data needed to make them possible.

I can write up some top-level methods that we can provide to emulate them though. It'll just require the source string & the error.

I hate to break things like that though - I'm kinda banking on people having not had a need to use them, which is pretty weak.

chanced commented 3 days ago

I should mention that I've got a lot cleaning up to do on it - I just let it sit idle for too long so I figured I'd push what I have. I plan on working on it over the weekend.

The more I think about just dropping the offset methods, the more reluctant I become to potentially break code without a clean transitional solution. I'll see if I can come up with something. Perhaps I should make the fields private and just keep it around for now, with the methods deprecated.

If you'd rather keep them the way they are - as enum structs, I totally get that. This may be just too much.

chanced commented 1 day ago

im going to start over, trying to minimize breaking changes.

chanced / jsonptr