go-audio / midi

The MIDI package is a high level MIDI library to consume and generate MIDI files.
Apache License 2.0
53 stars 6 forks source link

A bit of statistic of using this package #6

Open AlexanderMatveev opened 1 year ago

AlexanderMatveev commented 1 year ago

I have parsed about 200K midi files with this package and want to share some stats saying that about 35% of the files were not parsed.

I'm writing this because all files have been pre-checked with a mime-type check from the file's header. I could attach examples of files that could not be parsed for one reason or another. Manual sampling showed that they are normal, they are tapped and loaded in music software.

Here are the statistics on parsing errors. An empty value means that there are no errors, this is a successful parse.

 count  |                          status                          
--------+-------------------------------------------------------------
 128420 | 
  69659 | unexpected EOF
   7017 | unexpected data content - Expected track chunk ID MTrk, got (...)
    122 | error parsing TimeSignature - unexpected data content (2)
     86 | runtime error: index out of range [1] with length 0
     52 | couldn't read full var length text
      9 | MIDI Channel Prefix event error - unexpected data content
      4 | Time Signature length not 2 as expected but 0
      4 | format not supported - expected header size to be 6, was 10
      3 | Time Signature length not 2 as expected but 3
      2 | Set Tempo event error - unexpected data content
      2 | Time Signature length not 2 as expected but 4
      2 | error parsing TimeSignature - unexpected data content (10)
      2 | error parsing TimeSignature - unexpected data content (5)
      2 | runtime error: integer divide by zero
      1 | error parsing TimeSignature - unexpected data content (9)
      1 | error parsing SMPTE Offset - unexpected data content (84)
      1 | error parsing SMPTE Offset - unexpected data content (2371)
      1 | EOF
      1 | error parsing TimeSignature - unexpected data content (0)
      1 | Time Signature length not 2 as expected but 3141
(21 rows)

Also, I would take a look on two specific cases. First, a panic in the runtime (caught by recovery):

86 | runtime error: index out of range [1] with length 0

And there was a case when Decode() took indefinitely, I fixed this by wrapping with timeout through channels, but on current midi-files database I can't reproduce this so there is no statistic for this:

func DecodeWithTimeout(d *md.Decoder) error {
    result := make(chan error, 1)
    go func() {
        defer func() {
            if err := recover(); err != nil {
                result <- err.(error)
            }
        }()
        result <- d.Decode()
    }()
    select {
    case <-time.After(5 * time.Second):
        return errors.New("timed out")
    case err := <-result:
        return err
    }
}
mattetti commented 2 weeks ago

having some examples of files that fail to parse would be super useful so we can understand what causes the issue