codename-hub / php-parquet

PHP implementation for reading and writing Apache Parquet files/streams
Other
58 stars 8 forks source link

When trying decode empty string with StringDataTypeHandler->plainDecode getting an error from CustomBinaryReader->readString #10

Closed michael-0-1 closed 1 year ago

michael-0-1 commented 2 years ago

I'm getting an exception when StringDataTypeHandler::plainDecode getting an empty string ("") in $encoded parameter.

public function plainDecode(
    \codename\parquet\format\SchemaElement $tse,
    $encoded
  ) {
    if ($encoded === null) return null;

    $ms = fopen('php://memory', 'r+');
    fwrite($ms, $encoded);
    $br = BinaryReader::createInstance($ms);
    $element = $this->readSingleInternal($br, $tse, -1, false);
    return $element;
  }

The current validation is only for null but the encode can be an empty string. I think the resolution can be is to check for empty and not for null only.

Katalystical commented 2 years ago

Would you be so kind and provide a Parquet file or test case for reproducing the issue?

I'm convinced empty() will cause different issues, f.e. empty("0") might introduce false-positives, as it returns true.

Please note, feature/datapage-v2 already contains a fix for a similar or probably the same issue. Had no chance yet, I'll try to get a release done soon.

michael-0-1 commented 2 years ago

I can send you the file but I can't attach it to the issue publicly. If you can suggest other way that I can send you the file I will be happy to do this.

Katalystical commented 1 year ago

Should be fixed with 0.7.0. Feel free to re-open, if you still experience the issue.