Adzz / data_schema

Declarative schemas for data transformations.
Apache License 2.0
86 stars 9 forks source link

Captures raises in casting functions and re-raises #40

Closed Adzz closed 2 years ago

Adzz commented 2 years ago

Problem: when a casting function raises unexpectedly it can be hard to see which field was blowing up. The stack trace doesn't help because often cast fns are modules that implement a cast function.

This commit wraps a try / catch around all cast fns and if the function unexpectedly fails we raise a DataSchema error that includes enough information to be able to see which field it was that caused problems.

There are basically 3 things we could have done:

  1. use macros for fields and then try to engineer better stack traces by having the cast fn actually call in a specific module (the place where the schema is defined).
  2. Catch the original error and re-raise it but pre/append onto the error the extra information that will let the user know which field failed.
  3. Raise our own error with the information from the one we captured inside it.

The problem with 1. is it massively changes how the library works and it doesn't even necessarily work for runtime schemas which can be generated on the fly. I like how the schemas are just lists of data and introducing a macro hides what they really are. This gives us the ability to hide details but as the user of the lib I probably don't want that.

  1. is a bit weird and unexpected probably, so we went for 3.

I kind of don't like that we capture all exceptions and wrap them but I think it's okay. Users can match on the struct field to match on the specific exception if they want to catch it.

try do
  DataSchema.to_struct(my_input, MySchema)
rescue
  %DataSchema.CastFunctionError{wrapped_error: %RuntimeError{}} ->
    Logger.error("Expected Runtime Error")
  error ->
    reraise error, __STACKTRACE__
end
Adzz commented 2 years ago

Some error examples below:

     ** (DataSchema.CastFunctionError)

     Unexpected error when casting value "raise"
     for field :post_datetime in this part of the schema:

     list_of: {:post_datetime, "comments", DataSchema.RaiseString},

     Full path to field was:

           Field  :post_datetime
     Under Field  :post_datetime

     The casting function raised the following error:

     ** (RuntimeError) no m8

and


      Unexpected error when casting value %{date: ~D[2022-01-01]}
      for field :test in this part of the schema:

      @aggregate_fields [
        field: {:date, "date", DataSchema.DateCast},
      ]
      aggregate: {:test, @aggregate_fields, AggType},

      Full path to field was:

            Field  :test
      Under Field  :post_datetime

      The casting function raised the following error:

      ** (UndefinedFunctionError) function AggType.cast/1 is undefined (module AggType is not available)