chanced / jsonptr

JSON Pointer (RFC 6901) implementation for Rust
Apache License 2.0
44 stars 4 forks source link

`Token::as_index` OOB behavior #35

Closed asmello closed 2 months ago

asmello commented 2 months ago

More of a discussion than a issue, perhaps, but I find it unexpected that this is true:

assert_eq!(Token::new("-").as_index(1), Ok(1));
assert_eq!(Token::new("1").as_index(1), Ok(1));
assert_eq!(Token::new("2").as_index(1), Err(IndexError::OutOfBounds));

I can see the utility of the first case, but I'd expect that out-of-bounds numerical indices would always produce an error. It's quite error prone that they don't (it even contradicts the docs, as they're currently written).

If we go by the RFC literally, it defines - as an error condition in section 7, but doesn't define how to handle the error. In practice we know from sibling standards like RFC 6902 that using - is only an actual error condition in some contexts, so that puts it into a bit of a grey area.

If we were to prioritize semantic correctness, we should probably produce an error for - (but a separate error from what would result from a numerical out-of-bounds index). Another option would be to have the Ok variant carry an enum value indicating whether the index is internal or refers to the next element:

enum Index {
    Internal(usize),
    Next(usize)
}
fn as_index(...) -> Result<Index, IndexError>

Regardless, I think numerical indices that exceed the given array length should always be considered an error, or never.

chanced commented 2 months ago

Going to break these down by test-case:

assert_eq!(Token::new("-").as_index(1), Ok(1));

The way the spec introduces "-" is a bit confusing.

If the currently referenced value is a JSON array, the reference token MUST contain either:

  • characters comprised of digits (see ABNF below; note that leading zeros are not allowed) that represent an unsigned base-10 integer value, making the new referenced value the array element with the zero-based index identified by the token, or
  • exactly the single character "-", making the new referenced value the (nonexistent) member after the last array element.

The second bullet point lays out the mechanics of the "-" token. The ambiguity arises with:

Note that the use of the "-" character to index an array will always result in such an error condition because by definition it refers to a nonexistent array element. Thus, applications of JSON Pointer need to specify how that character is to be handled, if it is to be useful.

My reading of this is that it is specifically referring to resolution. In that context, "-" would never be resolvable as it is always out of bounds. This crate should error if the user attempts to resolve a "-" token.

As this crate also handles modification of data via pointer, utility exists in being able to specify a push operation, essentially. As the specification provides a specific means for doing so, it seems prudent to allow it.

assert_eq!(Token::new("1").as_index(1), Ok(1));

This is similar to above, in that in the case of mutation, the user needs to be able to reference the next would-be element in an array.

The user may wish to keep track of the index themselves rather than relying on "-" to indicate the desire to add an item.

Resolution would still fail in this case, as the index is out of bounds.

assert_eq!(Token::new("2").as_index(1), Err(IndexError::OutOfBounds));

Unlike above, in no situation would this be valid.

edit: I originally responded to the last case thinking the numbers were as I have edited them to be as above. I'm thinking you may have swapped them in your head too. Swapped, the test would fail (as it does currently written, but for other reasons).

chanced commented 2 months ago

I can see where the confusion is stemming though. For someone looking to take a token and turn it into a valid index of an existing array, this behavior would probably be unexpected.

Hmm.

chanced commented 2 months ago

This is what I'm thinking:

assert_eq!(Token::new("1").as_index(..1), Err(_));
assert_eq!(Token::new("1").as_index(...1), Ok(1));

Maybe? Maybe there's a better way to specify inclusivity?

I haven't worked with ranges yet. I don't know if its possible to narrow it down to only upper bounds, for example.

edit

Forgot about "-". It'd ruin your day if you believe you're solid on range bounds and it slips through.

chanced commented 2 months ago

What are your thoughts on this API?

Token::new("1").index();  // Ok(Index(Some(1)))
Token::new("-").index();  // Ok(Index(None))
Token::new("a").index();  // Err(ParseIntError)

Token::new("0").index()?.to(1);  // Ok(1)
Token::new("1").index()?.to(1);  // Err(OutOfBounds)
Token::new("-").index()?.to(1);  // Err(OutOfBounds)

Token::new("1").index()?.inclusive(1);  // Ok(1)
Token::new("-").index()?.inclusive(1);  // Ok(1)
Token::new("2").index()?.inclusive(1);  // Err(OutOfBounds)
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct Index {
    value: Option<usize>,
}

impl Index {
    fn parse(s: &str) -> Result<Self, ParseIntError> {
        if s == "-" {
            Ok(Self { value: None })
        } else {
            Ok(Self {
                value: Some(s.parse()?),
            })
        }
    }

    /// Returns the unbounded value, if possible.
    ///
    /// - If the `Index` stems from a non-negative integer, returns `Some(value)`.
    /// - If the `Index` stems from `"-"`, returns `None`.
    pub fn value(&self) -> Option<usize> {
        self.value
    }

    /// Checks the current index value against the bound and returns a `usize`
    /// representation if the index value falls within the range (`0..bound`).
    ///
    /// - For non-negative numerical values less than bound, returns index
    ///   value.
    /// - For the case of `"-"`, returns an error.
    ///
    /// # Errors
    /// Returns `OutOfBoundsError` if:
    ///  - the index is equal to or greater than the bound.
    ///  - the index is `"-"`.
    pub fn to(&self, bound: usize) -> Result<usize, OutOfBoundsError> {
        let Some(value) = self.value else {
            return Err(OutOfBoundsError {
                index: *self,
                bound,
            });
        };
        if value >= bound {
            return Err(OutOfBoundsError {
                index: *self,
                bound,
            });
        }
        Ok(value)
    }

    /// Checks the current index value against the **inclusive** bound and
    /// returns a `usize` representation if the index value falls within the
    /// inclusive range (`0..=bound`).
    ///
    /// - For non-negative numerical values less than or equal to bound, returns
    ///   index value.
    /// - For the case of `"-"`, returns the bound.
    ///
    /// # Errors
    /// Returns `OutOfBoundsError` if the index is greater than the bound.
    pub fn inclusive(&self, bound: usize) -> Result<usize, OutOfBoundsError> {
        let Some(value) = self.value else {
            return Ok(bound);
        };
        if value > bound {
            return Err(OutOfBoundsError {
                index: *self,
                bound,
            });
        }
        Ok(value)
    }
}

impl fmt::Display for Index {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        if let Some(value) = self.value {
            value.fmt(f)
        } else {
            write!(f, "-")
        }
    }
}
impl<'t> TryFrom<&'t Token> for Index {
    type Error = ParseIntError;

    fn try_from(value: &Token) -> Result<Self, Self::Error> {
        Index::parse(value.as_str())
    }
}

impl From<usize> for Index {
    fn from(value: usize) -> Self {
        Self { value: Some(value) }
    }
}

edit: Maybe an enum instead of a struct:

pub enum Index {
    Usize(usize), // 0..
    Next,         // '-'
}

Hmm. Yea. I like the enum more.

chanced commented 2 months ago
assert_eq!(Token::new("1").index(), Ok(Index::Usize(1)));
assert_eq!(Token::new("-").index(), Ok(Index::Next));
assert_eq!(Token::new("a").index(), Err(ParseIntError));

assert_eq!(Token::new("0").index()?.to(1), Ok(1));
assert_eq!(Token::new("1").index()?.to(1), Err(OutOfBounds)); 
assert_eq!(Token::new("-").index()?.to(1), Err(OutOfBounds)); 

assert_eq!(Token::new("1").index()?.inclusive(1), Ok(1));
assert_eq!(Token::new("-").index()?.inclusive(1), Ok(1));
assert_eq!(Token::new("2").index()?.inclusive(1), Err(OutOfBounds));
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub enum Index {
    /// A non-negative integer value
    Usize(usize),
    /// The `"-"` token which indicates the next would-be value in an array.
    Next,
}

impl Index {
    fn parse(s: &str) -> Result<Self, ParseIntError> {
        if s == "-" {
            return Ok(Self::Next);
        }
        let u = s.parse()?;
        Ok(Self::Usize(u))
    }

    /// Returns the unbounded value, if possible.
    ///
    /// - If the `Index` stems from a non-negative integer, returns `Some(value)`.
    /// - If the `Index` stems from `"-"`, returns `None`.
    pub fn as_usize(self) -> Option<usize> {
        let Self::Usize(u) = self else { return None };
        Some(u)
    }

    pub fn map_usize<F: Fn(usize) -> R, R>(self, f: F) -> Option<R> {
        match self {
            Self::Usize(u) => Some(f(u)),
            Self::Next => None,
        }
    }

    /// Checks the current index value against the bound and returns a `usize`
    /// representation if the index value falls within the range (`0..bound`).
    ///
    /// - For non-negative numerical values less than bound, returns index
    ///   value.
    /// - For the case of `"-"`, returns an error.
    ///
    /// # Errors
    /// Returns `OutOfBoundsError` if:
    ///  - the index is equal to or greater than the bound.
    ///  - the index is `"-"`.
    pub fn to(self, bound: usize) -> Result<usize, OutOfBoundsError> {
        let Some(value) = self.as_usize() else {
            return Err(OutOfBoundsError { index: self, bound });
        };
        if value >= bound {
            return Err(OutOfBoundsError { index: self, bound });
        }
        Ok(value)
    }

    /// Checks the current index value against the **inclusive** bound and
    /// returns a `usize` representation if the index value falls within the
    /// inclusive range (`0..=bound`).
    ///
    /// - For non-negative numerical values less than or equal to bound, returns
    ///   index value.
    /// - For the case of `"-"`, returns the bound.
    ///
    /// # Errors
    /// Returns `OutOfBoundsError` if the index is greater than the bound.
    pub fn inclusive(self, bound: usize) -> Result<usize, OutOfBoundsError> {
        match self {
            Index::Usize(u) => {
                if u > bound {
                    Err(OutOfBoundsError {
                        index: Self::Usize(u),
                        bound,
                    })
                } else {
                    Ok(u)
                }
            }
            Index::Next => Ok(bound),
        }
    }

    pub fn get_from<T>(self, slice: &[T]) -> Option<&T> {
        match self {
            Self::Usize(u) => slice.get(u),
            Self::Next => None,
        }
    }
    pub fn get_mut_from<T>(self, slice: &mut [T]) -> Option<&mut T> {
        match self {
            Self::Usize(u) => slice.get_mut(u),
            Self::Next => None,
        }
    }
    /// Returns the index value if it is within the bounds of the slice.
    pub fn of<T>(self, slice: &[T]) -> Option<usize> {
        match self {
            Self::Usize(u) => {
                if u < slice.len() {
                    Some(u)
                } else {
                    None
                }
            }
            Self::Next => None,
        }
    }

    /// Returns the index value if it is either the length of the slice (i.e.
    /// pointing to the next, non-existent index) or within the bounds of the
    /// slice.
    pub fn of_or_next<T>(self, slice: &[T]) -> Option<usize> {
        match self {
            Self::Usize(u) => {
                if u <= slice.as_ref().len() {
                    Some(u)
                } else {
                    None
                }
            }
            Self::Next => Some(slice.as_ref().len()),
        }
    }
}

impl fmt::Display for Index {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match *self {
            Index::Usize(u) => fmt::Display::fmt(&u, f),
            Index::Next => write!(f, "-"),
        }
    }
}
impl TryFrom<Token> for Index {
    type Error = ParseIntError;

    fn try_from(value: Token) -> Result<Self, Self::Error> {
        Index::parse(value.as_str())
    }
}

impl From<usize> for Index {
    fn from(value: usize) -> Self {
        Self::Usize(value)
    }
}
asmello commented 2 months ago

I originally responded to the last case thinking the numbers were as I have edited them to be as above. I'm thinking you may have swapped them in your head too. Swapped, the test would fail (as it does currently written, but for other reasons).

Yup, yeah the order there is a bit confusing too. Passing the length as an argument makes sense conceptually but it makes the API a bit weird to use. I updated my original comment.

I think you're onto something with the Index enum usage. I propose a slightly different naming convention:

assert_eq!(Token::new("1").to_index(), Ok(Index::Num(1)));
assert_eq!(Token::new("-").to_index(), Ok(Index::Next));
assert_eq!(Token::new("a").to_index(), Err(ParseIntError));

assert_eq!(Token::new("0").to_index()?.inside(1), Ok(1));
assert_eq!(Token::new("1").to_index()?.inside(1), Err(OutOfBounds)); 
assert_eq!(Token::new("-").to_index()?.inside(1), Err(OutOfBounds)); 

assert_eq!(Token::new("1").to_index()?.inside_inclusive(1), Ok(1));
assert_eq!(Token::new("-").to_index()?.inside_inclusive(1), Ok(1));
assert_eq!(Token::new("2").to_index()?.inside_inclusive(1), Err(OutOfBounds));

I'd also leave most of those public methods out, until they prove useful. Always easier to add stuff than remove from an API. The trait implementations look reasonable, though.

chanced commented 2 months ago

I was going back and forth over to_index, as_index and index. as_index was written off as it doesn't fit convention. index was just shorter and the name of the type to method aligned. I'm cool with to_index though as its more idiomatic.

inside_inclusive seems like a bit like an oxymoron. Hmm. I really like the change to Num from Usize.

I'm cool with leaving off most of the methods. Especially the get_* & of.

asmello commented 2 months ago

Linguistically I think any of the other options for the index conversion method would be reasonable, but thankfully we don't need to think too hard, since the Rust API Guidelines provide a nice summary of how to pick one.

I don't like how verbose inside_inclusive is, either. I always bias towards clarity, but maybe we can come up with a more concise alternative. Main reason I'm pushing back on inclusive is that the inside part is actually more important, it describes what the method does, while inclusive is just a qualifier, so it can't stand on its own.

chanced commented 2 months ago

Ah, i forgot that page existed! to_index it is.

Great point about inclusive. inside_or_next? That's kind of reading a bit clumsily. Hmm.

chanced commented 2 months ago

I can't think of any examples of fallible to_* methods and the Guidelines doesn't make mention of that, so far as I can tell from scanning it.

Would try_to be more idiomatic? Or is fallibility not a factor with to_*, unlike into_* / try_into_*?

asmello commented 2 months ago

I think in general fallibility is just implied by the return type. try_into/try_from probably exist only to distinguish them from the infallible alternatives. I think this probably only comes up in trait methods, because it doesn't generally make sense to have the same type implement both fallible and infallible variants of the same method...

How about just inside_or?

chanced commented 2 months ago

inside_or would be perfect if it weren't for the possibility of it erroring out by exceeding the bound. The convention of *_or_* is that the provided parameters become the default in the event that the first condition is not met.

let o: Option<&str> = None;
println!("{}", o.unwrap_or("example"));

I'd personally expect the following if I encountered it:

assert_eq!(Index::Num(0).inside_or(3), 3);
asmello commented 2 months ago

I was thinking of Option::ok_or read as "as the Ok variant, or the provided value", being a union operation of sorts, but you're right that it's more indicative of a fallback, an either/or situation. The example you give is a valid interpretation, so we should avoid this path.

We could just shorten it: inside_incl is not too bad I think.

chanced commented 2 months ago

Yep, inside_incl is perfect. It doesn't immediately trigger my brain to reprocess the meaning.