chanced / jsonptr

JSON Pointer (RFC 6901) implementation for Rust
Apache License 2.0
44 stars 4 forks source link

Refactor errors to include position, span (offset + len), and optional miette integration #95

Closed chanced closed 1 day ago

chanced commented 1 week ago

This pull request:

I'm just getting started with it so it will not compile right now. I should hopefully have it done today.

chanced commented 1 week ago

@asmello incase you happen to have a minute - can you please spot this and give me a bit of feedback on how you feel about both the degree of breaking change this is going to introduce as well as the general design? I only spot checked one crate which uses miette and it was kinda convoluted. I think this approach is fairly clean but it does break pretty much all error handling for consumers.

Here's a short summary, told through types:

pub struct Span {
    pub offset: usize,
    pub len: usize,
}

pub struct Spanned<E> {
    span: Span,
    source: E,
}

#[cfg(feature = "miette")]
impl<E> miette::Diagnostic for Spanned<E>
where
    E: 'static + miette::Diagnostic,
{
    fn labels(&self) -> Option<Box<dyn Iterator<Item = miette::LabeledSpan> + '_>> {
        Some(Box::new(once(miette::LabeledSpan::new(
            None,      // no label - this would require an allocation (String) + doesnt seem like we can add much
            self.offset(), // self.span.offset
            self.len(),    // self.span.len
        ))))
    }
  // all other methods are passthrough to `self::source`
}

// implements miette::Diagnostic as passthrough to Spanned<E>
pub struct Positioned<E> {
    position: usize,
    source: Spanned<E>,
}

usage is:

pub enum Error {
    FailedToParseIndex(Positioned<ParseIndexError>),
    OutOfBounds(Positioned<OutOfBoundsError>),
}
asmello commented 1 week ago

I haven't used miette like this (my uses have been through the dynamic macros or thiserror integration), but will give a look.

chanced commented 1 week ago

I will need to use the miette's Diagnostic derive macro on the errors themselves (e.g. ParseIndexError)

asmello commented 1 week ago

Oh I think I understand, each Error will implement Diagnostic on its own, but we'll leave the labels to a wrapper since that part is common to most (all?) of them. I think this makes sense. We may want some wrapper for the source code part too, for a similar benefit. After all the labels are pointless without the source code.

I think it'll be worth playing around with a few errors and seeing how it goes. Biggest risk I see is this ends up being equivalent work to just implementing these directly for each error type.

chanced commented 1 week ago

Oh, right! I forgot about source!

So the reason I opted for the wrappers is that most of the errors themselves do not have knowledge of their surroundings. I can change that though with wrappers specific to each.

/// Indicates that an `Index` is not within the given bounds.
#[derive(Debug, PartialEq, Eq)]
pub struct OutOfBoundsError {
    pub length: usize,
    pub index: usize,
}

to

#[derive(Debug, PartialEq, Eq)]
pub struct OutOfBoundsErrorWrapper {
    pub source: OutOfBoundsError,
    pub position: usize,
    pub span: Span,
}

or something like it

chanced commented 1 week ago

Or I guess I could go the other route and refactor the error into a struct:

// assign::Error
pub struct Error {
     kind: Kind,
     position: usize,
     span: Span,
}

pub enum Kind {
    FailedToParseIndex(ParseIndexError),
    OutOfBounds(OutOfBoundsError),
}
chanced commented 1 week ago

I'm going to flush this design out to see how it works - I haven't figured out what to do about source yet (the convoluted nature of pest's errors now make a lot more sense) so it may be all for not.

chanced commented 1 week ago

I have an idea for how to do this - just running into borrowck issues. I'm hoping the ergonomics are going to shake out but trait resolution may thwart my plans.

The way it is anticipated to work is:

let result = PointerBuf::parse_with_report("/example");
let err_report: Report<'a, ReplaceError, Pointer> = buf
    .report_err()
    .replace(1, "oops")
    .unwrap_err();

src/error.rs

// crate::error::Reporter
pub trait Reporter<T> {
    type Reporter<'e>
    where
        Self: 'e;

    // TODO: naming, alts: report, include_err_report, with_err_report, ???
    fn report_err(&'_ self) -> Self::Reporter<'_>;
}

// crate::error::MutReporter
pub trait MutReporter<T> {
    type Reporter<'e>
    where
        Self: 'e;

    // TODO: naming, alts: report, include_err_report, with_err_report, ???
    fn report_err(&'_ mut self) -> Self::Reporter<'_>;
}

src/pointer.rs

pub struct MutReporter<'p> {
    ptr: &'p mut PointerBuf,
}

impl<'p> MutReporter<'p> {
    pub fn replace<'t>(
        mut self,
        index: usize,
        token: impl Into<Token<'t>>,
    ) -> Result<Option<Token<'t>>, Report<'p, ReplaceError, Pointer>>
    where
        'p: 't,
    {
        // borrowck: can't reborrow - need to see if i can use polonius-the-crab
        match self.ptr.replace(index, token) {
            Ok(res) => Ok(res),
            Err(source) => Err(Report {
                source,
                source_code: Cow::Borrowed(&*self.ptr),
            }),
        }
    }
    // rest of methods which can mutate a pointer
}

pub struct Reporter<'p> {
    ptr: &'p Pointer,
}
// impl methods such as `assign`, `resolve`, etc

impl error::MutReporter<PointerBuf> for PointerBuf {
    fn report_err(&'_ mut self) -> Self::Reporter<'_> {
        MutReporter { ptr: self }
    }
}

For parsing, there would need to be a ParseReporter for PointerBuf and Pointer (String and &str respectively). I'm going to see if i can use polonius-the-crab to solve the issues I'm running into.

I'm trying to avoid having the user re-supply the SourceCode as I think that would be potentially error prone and weird ergonomics. At the same time, I want errors with lifetimes to be opt-in.

We don't need all of the refactoring to errors - this can be done keeping the errors with struct variants. We'd just need to add Span to most variants. Refactoring, while it introduces a good bit of churn, may still be worth it as the final iteration on errors.

edit: not sure why I had ohmyfish on my mind

asmello commented 1 week ago

I don't follow what role Reporter is playing here, but generally errors are intended to have static lifetimes, because they're expected to outlive their sources. This is why Error::source has 'static in the return signature. So I think allocating-on-error is fine, actually. But I'm probably misunderstanding what you're trying to achieve.

chanced commented 1 week ago

Yea, I should have explained that better. Sorry.

The reason Reporter is a trait is that I intend to implement it for serde_json::Value and toml::Value. All it does is provide a means to create a wrapper around a &str, String, &Pointer, &mut PointerBuf, &Value, or &mut Value.

Each Reporter wrapper re-exposes methods it is capable of handling which then maps the error side of a Result into Report, using the &Pointer, &str, or String as the source_code and the error as source.

pub struct Report<'a, E, T: ToOwned + ?Sized> {
    pub source: E, // the error
    pub source_code: Cow<'a, T>, // str or Pointer
}

re: allocation - do you mean all errors could/should allocate or that Cow in the above is unnecessary and all reports should be owned?

asmello commented 1 week ago

do you mean all errors could/should allocate

Oh hell no, that would be an awful constraint. But they likely should have static lifetimes. So, for example, an error containing a static string is fine. In other cases the input/source can be owned. Allocate is a sort of last resort, but what I'm saying is that borrowing probably shouldn't even be attempted.

This doesn't preclude the usage of Cow, but I imagine we'd need Cow<'static, T> there (not unheard of). Unsure if it's worth it, but could be.

asmello commented 1 week ago

As for the rest, thanks for the clarification. This sort of wrapping:

pub struct Report<'a, E, T: ToOwned + ?Sized> {
    pub source: E, // the error
    pub source_code: Cow<'a, T>, // str or Pointer
}

makes sense to me, although not sure how generalisable it is. Another thing to experiment with, I suppose.

chanced commented 1 week ago

I'm game for dumping the Cow and making it an owned String or PointerBuf. I think it'll simplify borrowing constraints.

chanced commented 1 week ago

@asmello i dont think this is going to work :(

I have a very little playground here where 2 flavors (Mutable and Immutable) are supposed to be in play but it doesn't seem to work how I want.

I was really hoping resolution wouldn't matter - that calling report_err() on multiple implementations would return which ever you ended up using (based on the methods used). That does not seem to be the case.

I'll toy around with it a bit more but I'm out of ideas if this doesn't end up working. I guess we just let them create the report by passing the error and String, Pointer, or PointerBuf.

use jsonptr::{error::Report, resolve, PointerBuf, ReplaceError};

fn mutable() {
    use jsonptr::error::{ReportErr, ReportErrMut};
    let mut ptr = PointerBuf::parse("/example").unwrap();
    let err: Report<ReplaceError> = ptr.report_err().replace(2, "invalid").unwrap_err();
    // 👍
}

fn immutable() {
    use jsonptr::error::{ReportErr, ReportErrMut};
    let ptr = PointerBuf::parse("/example").unwrap();
    let err: Report<resolve::Error> = ptr
        .report_err()
        .resolve(&serde_json::Value::Null)
        .unwrap_err();
    // 👍
}

fn both() {
    use jsonptr::error::{ReportErr, ReportErrMut};

    let mut ptr = PointerBuf::parse("/example").unwrap();
    let err: Report<resolve::Error> = ptr
        .report_err()
        .resolve(&serde_json::Value::Null)
        .unwrap_err();
    // 👍

    let err: Report<ReplaceError> = ptr.report_err().replace(2, "invalid").unwrap_err();
    // ❌ no method named `replace` found for struct `jsonptr::reporter::Immutable` in the current scope
}

https://github.com/chanced/spike-jsonptr-reports

Mutable:

impl<'p> Mutable<'p, ReplaceError> {
    pub fn replace<'t>(
        self,
        index: usize,
        token: impl Into<Token<'t>>,
    ) -> Result<Option<Token<'t>>, Report<ReplaceError>>
    where
        'p: 't, { }
}

Immutable:

impl<'p> Immutable<'p, resolve::Error> {
    pub fn resolve<R: crate::Resolve<Error = resolve::Error>>(
        self,
        value: &'_ R,
    ) -> Result<&'_ R::Value, Report<resolve::Error>> {}
}
chanced commented 1 week ago

I don't think this is worth it. I think we just provide a Report type that has a constructor that you pass an error and a PointerBuf and a string. The errors themselves contain the span.

chanced commented 3 days ago

@asmello I refactored the errors, although I haven't gotten to the Report yet. I think it'll be pretty simple though.

This includes breaking changes that I have commented out for ParseError. Right now, I have complete_offset, source_offset, and pointer_offset commented out. I don't think we need them or the data needed to make them possible.

I can write up some top-level methods that we can provide to emulate them though. It'll just require the source string & the error.

I hate to break things like that though - I'm kinda banking on people having not had a need to use them, which is pretty weak.

chanced commented 3 days ago

I should mention that I've got a lot cleaning up to do on it - I just let it sit idle for too long so I figured I'd push what I have. I plan on working on it over the weekend.

The more I think about just dropping the offset methods, the more reluctant I become to potentially break code without a clean transitional solution. I'll see if I can come up with something. Perhaps I should make the fields private and just keep it around for now, with the methods deprecated.

If you'd rather keep them the way they are - as enum structs, I totally get that. This may be just too much.

chanced commented 1 day ago

im going to start over, trying to minimize breaking changes.