Closed mping closed 10 years ago
Add {:implicit-schema true} to the opts. That should prevent it from using a schema in the script.
-Matt
On Thursday, April 3, 2014 at 11:20 AM, Miguel Ping wrote:
Use case is using loaders that support schema such as parquet. In pig, any of this works: RAW_DATA = LOAD 'parquet.gz/' USING parquet.pig.ParquetLoader(); --or RAW_DATA = LOAD 'parquet.gz/' USING parquet.pig.ParquetLoader('contentHost:chararray'); --or RAW_DATA = LOAD 'parquet.gz/' USING parquet.pig.ParquetLoader('contentHost:chararray') AS (contentHost:chararray); -- DESCRIBE RAW_DATA; -- will work properly with any since we have metadata
but an empty array
(raw/load$ location '[] ;; these are the fields this loader returns storage opts) (raw/bind$ ...
generates this: load20 = LOAD '/path/to/data/' USING MyComplexStorage('name', 'address', 'phone') AS ();
Since the loader handles the schema, there's no need for the AS clause.— Reply to this email directly or view it on GitHub (https://github.com/Netflix/PigPen/issues/26).
You'll still need to tell it what fields the ParquetLoader will return, so that it can reference them in the next command. PigPen doesn't do any interrogation of that code, so it will think that the loader isn't returning any usable fields and the next operation won't be able to do anything.
Use case is using loaders that support schema such as parquet. In pig, any of this works:
but an empty array
generates this:
Since the loader handles the schema, there's no need for the
AS
clause.