elodina / xml-avro

Generate Avro schema and Avro binary from XSD schema and XML
http://www.stealth.ly
Apache License 2.0
67 stars 60 forks source link

generate avro "not a data file" #10

Open ty-n-42 opened 9 years ago

ty-n-42 commented 9 years ago

Hi, This seems like a great tool. Unfortunately I'm a newb and when I create avsc and avro files from xsd and xml files I run into trouble trying to use the avro file.

In Hive and using the command line avro-tools.jar I get the error message "not a data file". Also when I view the avro file content in Hue the preview screen renders like a binary file so I don't think it is recognising the avro file either.

Is there something I need to do with the .avro files xml-arvo creates before I can use them?

Here's the exception from avro-tools:

java -jar ~/Downloads/avro-tools-1.7.7.jar totext ./test.avro -
Exception in thread "main" java.io.IOException: Not a data file.
    at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
    at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
    at org.apache.avro.tool.ToTextTool.run(ToTextTool.java:67)
    at org.apache.avro.tool.Main.run(Main.java:84)
    at org.apache.avro.tool.Main.main(Main.java:73)

update: I managed to get the code up and running in NetBeans and changed the last part of Converter.main() to see if I could get some textual output:

try (OutputStream stream = new FileOutputStream(opts.avroFile)) {
            DatumWriter<Object> datumWriter = new SpecificDatumWriter<>(schema);
            //datumWriter.write(datum, EncoderFactory.get().directBinaryEncoder(stream, null));
            datumWriter.write(datum, EncoderFactory.get().jsonEncoder(schema, stream));

This produced the content I was expecting. Any suggestion on what may be happening with directBinaryEncoder is appreciated.

Thanks

OzLe commented 8 years ago

Your fix is not correct. it just creates a JSON file. In-order to create a Hadoop working AVRO that contains the Schema replace the last statement to:

try (OutputStream stream = new FileOutputStream(opts.avroFile)) { DataFileWriter<Object> fileWriter = new DataFileWriter<>(datumWriter); fileWriter.create(schema,stream); fileWriter.append(datum);

This will work.

rajabhathor commented 8 years ago

I have the same issue! Hive loads the data fine but complains its not a data file... And I'm working for a marquee HDP client and would imagine this gets some attention ... And I don't feel comfortable messing around with the code for obvious reasons Any assistance is appreciated!!! Raj

GeethanadhP commented 8 years ago

@OzLe's code is working, i have tested that on hive as well.. I have an updated code available in the fork with some fixes.