benwatson528 / intellij-avro-parquet-plugin

A Tool Window plugin for IntelliJ that displays Avro and Parquet files and their schemas in JSON.
Apache License 2.0
45 stars 8 forks source link

Avro files crashing plugin in IntelliJ #97

Closed kevinashaw closed 2 years ago

kevinashaw commented 2 years ago

We are long time users of this excellent Avro browser for IntelliJ and PyCharm. I have recently found some Avro files that crash IntelliJ and the Plugin. Some of the files read correctly, but others cause a crash. When I look at the builder object in debug mode (just prior to writing), the data is correct and all attributes have been entered. This is a company project, Is there are way to share data examples without putting it out in public? The source for populating the record is written in Java (Version 1.11.0) and the Avro compiler is the latest (1.11.0). Below is the .avsc schema file. Of course, I can keep deleting lines in the schema until I find the problem, but I thought I would report issue so the PlugIn can handle and report the error. Thanks for help!

{
  "namespace" : "com.snip",
  "type"      : "record",
  "name"      : "AvroRecord",
  "doc"       : "",
  "fields"    : [
    {"name" : "AvroSchemaVersion" , "type" : "int"   , "default" : 102, "doc" : ""},
    {"name" : "Field1"            , "type" : "string",                  "doc" : ""},
    {"name" : "Field2"            , "type" : "string",                  "doc" : ""},
    {"name" : "Field3"            , "type" : "long"  ,                  "doc" : ""},
    {"name" : "Field4"            , "type" : "boolean","default" :false,"doc" : ""},
    {"name" : "Field5"            , "type" : "string", "default" :  "", "doc" : ""},
    {"name" : "Field6"            , "type" : "string", "default" :  "", "doc" : ""},
    {"name" : "Field7"            , "type" : "int"   , "default" :   0, "doc" : ""},
    {"name" : "Field8"            , "type" : "string", "default" :  "", "doc" : ""},
    {"name" : "Field9"            , "type" : "string", "default" :  "", "doc" : ""},
    {"name" : "Field10"           , "type" : "string", "default" :  "", "doc" : ""},
    {"name" : "Field11"           , "type" : "string", "default" :  "", "doc" : ""},
    {"name" : "Field12"           , "type" : "long"  , "default" :   0, "doc" : ""},
    {"name" : "Field13"           , "type" : "long"  , "default" :   0, "doc" : ""},
    {"name" : "Field14"           , "type" : "string", "default" :  "", "doc" : ""},
    {"name" : "Field15"           , "type" : {"type": "array", "items":  "long"}, "doc" : ""},
    {"name" : "Field16"           , "type" : {"type": "array", "items":  "long"}, "doc" : ""},
    {"name" : "Field17"           , "type" : "string", "default" :  "", "doc" : ""},
    {"name" : "Field18"           , "type" : {"type": "array", "items": "bytes"}, "doc" : ""}
  ]
}
benwatson528 commented 2 years ago

Hi Kevin,

You can check out the repo and then add a unit test - there's a class of unit tests that accept Avro files and then assert that they're parsed correctly. This will let you debug the code locally against whichever files you want to throw at it.

Otherwise I'll take a look when I'm back, but I'd need to know the specific error you're seeing.

Ben

On Sat, 7 May 2022, 06:08 Kevin Shaw, @.***> wrote:

We are long time users of this excellent Avro browser for IntelliJ and PyCharm. I have recently found some Avro files that crash IntelliJ and the Plugin. Some of the files read correctly, but others cause a crash. When I look at the builder object in debug mode (just prior to writing), the data is correct and all attributes have been entered. This is a company project, Is there are way to share data examples without putting it out in public? The source for populating the record is written in Java (Version 1.11.0) and the Avro compiler is the latest (1.11.0). Below is the .avsc schema file. Of course, I can keep deleting lines in the schema until I find the problem, but I thought I would report issue so the PlugIn can handle and report the error. Thanks for help!

{ "namespace" : "com.snip", "type" : "record", "name" : "AvroRecord", "doc" : "", "fields" : [ {"name" : "AvroSchemaVersion" , "type" : "int" , "default" : 102, "doc" : ""}, {"name" : "Field1" , "type" : "string", "doc" : ""}, {"name" : "Field2" , "type" : "string", "doc" : ""}, {"name" : "Field3" , "type" : "long" , "doc" : ""}, {"name" : "Field4" , "type" : "boolean","default" :false,"doc" : ""}, {"name" : "Field5" , "type" : "string", "default" : "", "doc" : ""}, {"name" : "Field6" , "type" : "string", "default" : "", "doc" : ""}, {"name" : "Field7" , "type" : "int" , "default" : 0, "doc" : ""}, {"name" : "Field8" , "type" : "string", "default" : "", "doc" : ""}, {"name" : "Field9" , "type" : "string", "default" : "", "doc" : ""}, {"name" : "Field10" , "type" : "string", "default" : "", "doc" : ""}, {"name" : "Field11" , "type" : "string", "default" : "", "doc" : ""}, {"name" : "Field12" , "type" : "long" , "default" : 0, "doc" : ""}, {"name" : "Field13" , "type" : "long" , "default" : 0, "doc" : ""}, {"name" : "Field14" , "type" : "string", "default" : "", "doc" : ""}, {"name" : "Field15" , "type" : {"type": "array", "items": "long"}, "doc" : ""}, {"name" : "Field16" , "type" : {"type": "array", "items": "long"}, "doc" : ""}, {"name" : "Field17" , "type" : "string", "default" : "", "doc" : ""}, {"name" : "Field18" , "type" : {"type": "array", "items": "bytes"}, "doc" : ""} ] }

— Reply to this email directly, view it on GitHub https://github.com/benwatson528/intellij-avro-parquet-plugin/issues/97, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPNI2KUV3JWTUB6HSKPFDLVIX3EXANCNFSM5VJ33QDA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kevinashaw commented 2 years ago

I will do this and post my findings. Thank you. -Kevin

kevinashaw commented 2 years ago

I'm pleased to say that I discovered the problem. I was not reseting the output stream between each file, so the data was concatenating. Thank you for the help!