apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.65k stars 1.42k forks source link

Add support for Pig datetimes #1661

Open asfimport opened 10 years ago

asfimport commented 10 years ago

There's currenly no support for conversion to/from Pig datetimes

Reporter: Christian Rolf / @ccrolf

Related issues:

Note: This issue was originally created as PARQUET-137. Please see the migration documentation for further details.

asfimport commented 10 years ago

Brock Noland / @brockn: This uses the NanoTime class. We should probably fix that implementation as it's wrong. See PARQUET-114.

asfimport commented 10 years ago

Brock Noland / @brockn: Hi @ccrolf,

Long story short:

1) The NanoTime class was implemented incorrect as described in PARQUET-114. 2) As noted by the package name on the class this was implemented as an example. Users are expected to implement their own class.

Brock

asfimport commented 10 years ago

Ryan Blue / @rdblue: Thanks for looking at this, Christian! Brock is right that NanoTime is for demo purposes only. In fact, I wouldn't recommend building your own copy of it either because the "timestamp" it works with is undocumented and uses an int96 without an annotation. We've been looking at this problem lately and we have defined both type annotations and specified how they should be interpreted. The next step is to implement those types in the object models like you've done here. In fact, this will be the first implementation.

The specification for date/time types is on the LogicalTypes page. If you need any help with the spec, feel free to ask questions and I'll clarify.

asfimport commented 9 years ago

Christian Rolf / @ccrolf: Thanks for the feedback! Sorry I didn't have time to look into this further for a long time. Looks like Parquet format 2 will have totally different date types. So there isn't much point in fixing this?

asfimport commented 9 years ago

Ryan Blue / @rdblue: Yeah, there's definitely value in making Pig work with the dates and times from the spec. Does Pig have date and time types as well?

asfimport commented 9 years ago

Christian Rolf / @ccrolf: Ok, will try to find time for it, Pig uses Joda time internally: http://pig.apache.org/docs/r0.14.0/basic.html#data-types

asfimport commented 8 years ago

Oleksiy Sayankin: Fixed joda-time scope and version in dependences.

asfimport commented 8 years ago

Oleksiy Sayankin: Tested fix with Pig and Hive

STEP 1: Create parquet data in Hive

CREATE TABLE IF NOT EXISTS `test` (id int);
CREATE External TABLE `pig` (
  `campaignid` bigint,
  `siteid` bigint,
  `name` string,
  `lastupdated` timestamp,
  `created` timestamp,
  `active` boolean
) STORED AS PARQUET LOCATION '/user/test/pig';

Insert data.

INSERT OVERWRITE TABLE `test` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
INSERT OVERWRITE TABLE `pig`
SELECT
  1,
  2,
  'sample',
  '2016-10-17 11:22:33.232323434',
  '2016-10-17 11:22:33.232323434',
  1
FROM `test`
LIMIT 10;

STEP 2. Load the data using pig:

REGISTER /usr/pig/pig-0.16/contrib/piggybank/java/piggybank.jar;
parqData = LOAD '/user/test/pig/000000_0' USING parquet.pig.ParquetLoader('campaignid:long,siteid:long,name:chararray,lastupdated:datetime,created:datetime,active:boolean');
DUMP parqData;

EXPECTED RESULT:

(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)

Worked as expected.

asfimport commented 8 years ago

Oleksiy Sayankin: Hi all!

Can any body review the patch and apply it? Our customer is suffering...

Thanks in advance.

asfimport commented 8 years ago

Ryan Blue / @rdblue: Thanks, [~osayankin]! I didn't realize there was a patch to review here. We'll take a look.

Could you open a pull request on github for this?

asfimport commented 7 years ago

Oleksiy Sayankin: Hi @rdblue.

I am not a contributor at https://github.com/Parquet/parquet-mr so I can not create a separate branch and hence a pull request for merge: not enough permissions.

Well, either I need to get permissions to create a new branch or ask some one who has ones to create it for me and apply changes from the patch.

asfimport commented 7 years ago

Oleksiy Sayankin: PS: I expected that patch will be applied automatically if it is well formatted. I waited something like this

PARQUET-[.][-].patch

asfimport commented 7 years ago

Ryan Blue / @rdblue: You can open a pull request from your own repository. Just push a branch to your github fork and open a PR for it from there.

You may want to make sure you forked from https://github.com/apache/parquet-mr so you don't have to select that one manually. We no longer use the old repository.

asfimport commented 7 years ago

Oleksiy Sayankin: Done: https://github.com/apache/parquet-mr/pull/387

asfimport commented 7 years ago

Viraj Bhat: [~osayankin] @rdblue is the patch being re-submitted after comments in github. Is this being fixed elsewhere that I do not know of? Viraj