Netflix / iceberg

Iceberg is a table format for large, slow-moving tabular data
Apache License 2.0
472 stars 59 forks source link

initial implementation of startsWith #78

Open Liorba opened 5 years ago

Liorba commented 5 years ago

Fixes https://github.com/Netflix/iceberg/issues/49

Initial implementation only for iceberg-api expressions

Open issues:

  1. should we add implementation for spark, parquet and avro?
  2. should we use case sensitive startsWith?
  3. should we support unicode characters startsWith?
rdblue commented 5 years ago

@Liorba, thanks for working on this! This is a good start.

To answer your questions:

  1. Integrating engines to use startsWith can be done in a follow-up.
  2. I think this should be case sensitive.
  3. I'm not sure what you mean.

This also needs to update the project methods in each transform. Otherwise, Iceberg wouldn't be able to convert STARTS_WITH data predicates to partition predicates.

rdblue commented 5 years ago

@Liorba, thanks for updating, but this still needs to update the project methods in each transform so Iceberg can convert these into partition predicates.

rdblue commented 5 years ago

@Liorba, if you want to continue working on this, please re-open it in the apache/incubator-iceberg repository. That's the project's new home. Thanks!