Catching invalid format for CBOR.decode

grzuy commented 4 years ago

CBOR.decode can raise several error types when fed with invalid cbor input.

require "cbor"
require 'securerandom'

errors = {}

1_000_000.times do |i|
  begin
    CBOR.decode(SecureRandom.random_bytes(64))
  rescue => ex
    errors[ex.class.name] = ex
  end
end

pp errors

returns

{"CBOR::MalformedFormatError"=>
  #<CBOR::MalformedFormatError: extra bytes follow after a deserialized object>,
 "EOFError"=>#<EOFError: end of buffer reached>,
 "TypeError"=>#<TypeError: can't convert CBOR::Simple into an exact number>,
 "URI::InvalidURIError"=>#<URI::InvalidURIError: bad URI(is not URI?): 2>,
 "FloatDomainError"=>#<FloatDomainError: NaN>,
 "RegexpError"=>
  #<RegexpError: invalid multibyte character: /\x1Er\x13w\xBCx0N:n\xD3aw{\x98\xB1Q\x1D\xC0\u06FC5\xF0/>}

Do you think it's possible for the CBOR decoder to either provide some sort fo CBOR.valid?(input) method, or for CBOR.decode(input) to always return the same exception when input is detected to be invalid CBOR?

This was raised in https://github.com/cedarcode/cose-ruby/issues/40.

Thank you in advance!

grzuy commented 4 years ago

or for CBOR.decode(input) to always return the same exception when input is detected to be invalid CBOR?

I guess already used CBOR::MalformedFormatError could be a possible candidate?

grzuy commented 4 years ago

For what is worth, these are the occurrences...

{"EOFError"=>100332,
 "CBOR::MalformedFormatError"=>895855,
 "TypeError"=>3734,
 "URI::InvalidURIError"=>29,
 "FloatDomainError"=>1,
 "RegexpError"=>3}

Script:

require "cbor"
require 'securerandom'

errors = {}

1_000_000.times do |i|
  begin
    CBOR.decode(SecureRandom.random_bytes(64))
  rescue => ex
    errors[ex.class.name] ||= 0
    errors[ex.class.name] += 1
  end
end

pp errors

cabo commented 4 years ago

I believe it is useful to know whether the CBOR item ended prematurely (EOFError), some other kind of malformedness occurred, or a type-specific error occurred (obviously on a Tag, e.g., URI::InvalidURIError). What is your motivation for wanting to turn this into Kernighan's car?

cabo commented 4 years ago

BTW, errors = Hash.new(0) saves having to do the errors[ex.class.name] ||= 0.

cabo commented 4 years ago

But the number of exceptions you found actually points to a genuine bug: An invalid tag (e.g., a URI tag with an invalid URI) is not presented to the application as such, but causes an exception that denies the application access to the entire piece of CBOR. This is great for purists, but not the best strategy for interoperability.

grzuy commented 4 years ago

What is your motivation

Hi @cabo,

Just trying to understand what could be the best way to gracefully handle wrong/corrupt cbor data on at the application level when data fed by the user agent.

I guess that by doing

begin
  CBOR.decode(input)
rescue CBOR::MalformedFormatError, EOFError, TypeError
  # handle error
end

one would already catch > 99% of the cases, so that may be enough.

The thing is that from the perspective of the caller, EOFError and TypeError are quite generic and they don't even let you know if the error is coming from CBOR or from someplace else.

It crossed my mind that it might be desirable from the perspective of the caller to be able to do something like:

begin
  CBOR.decode(input)
rescue CBOR::MalformedFormatError
  # handle error
end

or

begin
  CBOR.decode(input)
rescue CBOR::DecodingError
  # handle error
end

or in case it makes more sense to have several different possible decoding errors, all those could inherit from a CBOR-generic CBOR::Error and you could just:

begin
  CBOR.decode(input)
rescue CBOR::Error
  # handle error
end

cabo / cbor-ruby

Catching invalid format for CBOR.decode #14