codename-hub / php-parquet

PHP implementation for reading and writing Apache Parquet files/streams
Other
58 stars 8 forks source link

Expected parameter of type '\codename\parquet\data\DataField', '\codename\parquet\data\Field' provided #1

Closed darthf1 closed 2 years ago

darthf1 commented 2 years ago

Hi!

In your READ example, you have

$dataFields = $parquetReader->schema->GetDataFields();

and later

foreach($dataFields as $field) {
  $columns[] = $groupReader->ReadColumn($field);
}

However, the return value of the GetDataFields() function is Field[], while the first parameter for the ReadColumn function is a DataField.

image

Katalystical commented 2 years ago

You're right, this is a little bit misleading. However, DataField extends Field and GetDataFields() will return only DataFields by design (though returned as Field - or more precisely Field[]). I'll try to incorporate this into the next release, as I don't expect this to be a breaking change, but just a formal fix. Thanks!

Keep in mind if you're working with more complex fields: List, Struct and Map fields are not Datafields and have no 'adjacent' DataColumns, as per Parquet spec - and therefore, cannot be read directly by passing it to ReadColumn. Instead, they're complex wrappers around one or more DataFields (or even more StructFields or MapFields).

I'm working on a method to 'convert' a full parquet schema and its data to PHP-(assoc)-arrays, though is is a tough job (in regard to Repetition Levels and Definition Levels). The current implementation is best suited for 'flat' parquet files w/o any complex nesting.

Katalystical commented 2 years ago

I'll leave this open until fixed, sorry.

Katalystical commented 2 years ago

Just fixed.