fabian-hiller / valibot

The modular and type safe schema library for validating structural data 🤖
https://valibot.dev
MIT License
6.33k stars 204 forks source link

Adding labels to values in issue paths #807

Open jamiebuilds opened 2 months ago

jamiebuilds commented 2 months ago

When validating really large datasets it would be nice if there were a way to describe the path for specific issues.

For example, if you have an issue with a path like this:

components:
  71:
    licenses: Invalid type: Expected Array but received undefined

The index 71 there is difficult to track down. But if I could write something like:

Note: This is just a rough draft of an idea, I could see this being a method or an action, see other api ideas below

// function label(label: (input: unknown) => string): LabelAction

let ComponentSchema = v.pipe(
  v.object({
    id: v.string(),
    licenses: v.array(LicenseSchema),
  }),
  v.label(input => {
    // Not sure if this should 
    if (typeof input === "object" && input != null && typeof input.id == "string") {
      return `Component(${label.id})`
    } else {
      return `Component`
    }
  })
)

The issue path could then contain a label which is describing the input at the current path index.

    {
      kind: 'schema',
    message: 'Invalid type: Expected Array but received undefined',
      ...
      path: [
        { type: 'object', origin: 'value', key: 'components', ... },
        {
        type: 'array',
        origin: 'value',
        input: [Array],
        key: 71,
        value: [Object]
+       label: "Component(npm-package-id)"
      },
        { type: 'object', origin: 'value', key: 'licenses', ... }
      ],
      ...
    }

That would allow you to print the issue with enough information to identify the invalid data more quickly:

components:
  71: Component(npm-package-id)
    licenses: Invalid type: Expected Array but received undefined

Alternative APIs

Similar to partialCheck:

// function label(pathList: PathKeys, label: (input: Selection) => string): LabelAction
let ComponentSchema = v.pipe(
  v.object({
    id: v.string(),
    licenses: v.array(LicenseSchema),
  }),
  v.label([["id"]], input => {
    return `Component(${label.id})`
  }),
)

As a method:

// function label(schema: Schema, label: (input: unknown) => string): SchemaWithLabel

let ComponentSchema = v.label(
  v.object({
    id: v.string(),
    licenses: v.array(LicenseSchema),
  }),
  input => {
    // Not sure if this should 
    if (typeof input === "object" && input != null && typeof input.id == "string") {
      return `Component(${label.id})`
    } else {
      return `Component`
    }
  })
)

Something more opinionated:

I'm not sure this is a good idea, but as an option maybe:

// function label(name: string, primaryKey: string[], schema: ObjectSchema): SchemaWithLabel

let ComponentSchema = v.label("Component", ["id"], v.object({
  id: v.string(),
  licenses: v.array(LicenseSchema),
}))
fabian-hiller commented 2 months ago

Thanks for creating this PR. In simple words, you want to be able to label path items to generate a stack-trace like output with a better overview through these labels?

sihu commented 5 hours ago

I agree with the fact, that more context would make it much easier to debug an issue (especially if you have a lot of schemas and a field has a common name used in more than one schema). Then it would make it very helpful, if the ValiError would have some context. I would prefer not defining a label for each schema. I could imagine that it would be enough to just show the user the whole context of the schema, meaning in the given example:

licenses: Invalid type: Expected Array but received undefined

in

object({
  id: v.string(),
  licenses: v.array(LicenseSchema),
})

In theory it should be possible to track that from the stacktrace of the error where you can follow the code and see the schema used. But in practice you often have a compilation step and it's much harder to debug in a compiled chunk. That's also why I would appreciate the idea.

jamiebuilds commented 44 minutes ago

No, I don't think that would be anymore useful, nor could you actually get names like "LicenseScheme" to print back out. I specifically want to be able to label records so that when I have an array of hundreds or thousands of items, I don't have to count which index in the array the schema error is talking about

[...thousands of items..., { id: "foo" }, ...thousands of items...]
components:
  3041:
    licenses: Invalid type: Expected Array but received undefined

I need to now write separate code to find out which one of these items is missing licenses to grab the "id".