Closed tomnoah1 closed 1 month ago
Thanks for reporting the issue! Could you provide the complete code to reproduce it?
Sadly I can't export it, but it is something just like what I sent. The problem occur in line 4, where I created a new writer and delete the ole one (the garbage collector delete the old one because it is the same var name). At that moment, it emptied the file.
My example code runs fine, and I can’t reproduce your issue. Could you take a look at my example code? The code is as follows:
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.avro.Schema;
import org.apache.avro.SchemaBuilder;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.apache.parquet.avro.AvroParquetWriter;
import org.apache.parquet.hadoop.ParquetWriter;
import org.apache.parquet.io.LocalOutputFile;
public class _02_Parquet_Example {
public static void main(String[] args) throws IOException {
// Define the Avro schema
Schema schema = SchemaBuilder.record("User")
.fields()
.name("name").type().stringType().noDefault()
.name("age").type().intType().noDefault()
.endRecord();
// Create a GenericRecord
GenericRecord genericRecord = new GenericData.Record(schema);
genericRecord.put("name", "John Doe");
genericRecord.put("age", 30);
// Parquet file paths
String localFilePath = "./output-file1.parquet";
String localFilePath2 = "./output-file2.parquet";
LocalOutputFile localOutputFile = new LocalOutputFile(Paths.get(localFilePath));
LocalOutputFile localOutputFile2 = new LocalOutputFile(Paths.get(localFilePath2));
// Write to the first Parquet file
ParquetWriter<GenericRecord> writer = AvroParquetWriter
.<GenericRecord>builder(localOutputFile)
.withSchema(schema)
.build();
writer.write(genericRecord);
writer.close();
// Write to the second Parquet file with the same record
writer = AvroParquetWriter
.<GenericRecord>builder(localOutputFile2)
.withSchema(schema)
.build();
writer.write(genericRecord);
writer.close();
System.out.println("Data written to Parquet files successfully.");
}
}
@tomnoah1
You are writing it again. Try:
// Write to the second Parquet file with the same record
writer = AvroParquetWriter
.<GenericRecord>builder(localOutputFile2)
.withSchema(schema)
.build();
Instead of:
// Write to the second Parquet file with the same record
writer = AvroParquetWriter
.<GenericRecord>builder(localOutputFile2)
.withSchema(schema)
.build();
writer.write(genericRecord);
writer.close();
Now when I took a second look I see we did on the second initialization:
writer = AvroParquetWriter
.<GenericRecord>builder(localOutputFile)
.withSchema(schema)
.build()
That means, we accidentally used again localOutputFile
instead of localOutputFile2
, and I guess thats what cause the problem.
A new writer to the same filename with the same path.
@tomnoah1 If there are no other issues with this, you can close it. Thanks.
Thanks @FlechazoW for the help!
Describe the bug, including details regarding any error messages, version, and platform.
Version: 1.14.1
I got the following code:
After the third line, I can see the file with the data (genericRecord), and read it. For some reason, after the 4th line, the file is getting empty. It contains no data and weight 0 bytes. When trying to read it I am getting:
File(<some_name>) cannot be read as parquet. File matching that expression not found.
Without the 4th line, the file and its content remains steady.
Component(s)
No response