Open asfimport opened 10 years ago
Brock Noland / @brockn: This uses the NanoTime class. We should probably fix that implementation as it's wrong. See PARQUET-114.
Brock Noland / @brockn: Hi @ccrolf,
Long story short:
1) The NanoTime
class was implemented incorrect as described in PARQUET-114.
2) As noted by the package name on the class this was implemented as an example. Users are expected to implement their own class.
Brock
Ryan Blue / @rdblue: Thanks for looking at this, Christian! Brock is right that NanoTime is for demo purposes only. In fact, I wouldn't recommend building your own copy of it either because the "timestamp" it works with is undocumented and uses an int96 without an annotation. We've been looking at this problem lately and we have defined both type annotations and specified how they should be interpreted. The next step is to implement those types in the object models like you've done here. In fact, this will be the first implementation.
The specification for date/time types is on the LogicalTypes page. If you need any help with the spec, feel free to ask questions and I'll clarify.
Christian Rolf / @ccrolf: Thanks for the feedback! Sorry I didn't have time to look into this further for a long time. Looks like Parquet format 2 will have totally different date types. So there isn't much point in fixing this?
Ryan Blue / @rdblue: Yeah, there's definitely value in making Pig work with the dates and times from the spec. Does Pig have date and time types as well?
Christian Rolf / @ccrolf: Ok, will try to find time for it, Pig uses Joda time internally: http://pig.apache.org/docs/r0.14.0/basic.html#data-types
Oleksiy Sayankin: Fixed joda-time scope and version in dependences.
Oleksiy Sayankin: Tested fix with Pig and Hive
STEP 1: Create parquet data in Hive
CREATE TABLE IF NOT EXISTS `test` (id int);
CREATE External TABLE `pig` (
`campaignid` bigint,
`siteid` bigint,
`name` string,
`lastupdated` timestamp,
`created` timestamp,
`active` boolean
) STORED AS PARQUET LOCATION '/user/test/pig';
Insert data.
INSERT OVERWRITE TABLE `test` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
INSERT OVERWRITE TABLE `pig`
SELECT
1,
2,
'sample',
'2016-10-17 11:22:33.232323434',
'2016-10-17 11:22:33.232323434',
1
FROM `test`
LIMIT 10;
STEP 2. Load the data using pig:
REGISTER /usr/pig/pig-0.16/contrib/piggybank/java/piggybank.jar;
parqData = LOAD '/user/test/pig/000000_0' USING parquet.pig.ParquetLoader('campaignid:long,siteid:long,name:chararray,lastupdated:datetime,created:datetime,active:boolean');
DUMP parqData;
EXPECTED RESULT:
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
Worked as expected.
Oleksiy Sayankin: Hi all!
Can any body review the patch and apply it? Our customer is suffering...
Thanks in advance.
Ryan Blue / @rdblue:
Thanks, [~osayankin]
! I didn't realize there was a patch to review here. We'll take a look.
Could you open a pull request on github for this?
Oleksiy Sayankin: Hi @rdblue.
I am not a contributor at https://github.com/Parquet/parquet-mr so I can not create a separate branch and hence a pull request for merge: not enough permissions.
Well, either I need to get permissions to create a new branch or ask some one who has ones to create it for me and apply changes from the patch.
Oleksiy Sayankin: PS: I expected that patch will be applied automatically if it is well formatted. I waited something like this
PARQUET-
Ryan Blue / @rdblue: You can open a pull request from your own repository. Just push a branch to your github fork and open a PR for it from there.
You may want to make sure you forked from https://github.com/apache/parquet-mr so you don't have to select that one manually. We no longer use the old repository.
Viraj Bhat:
[~osayankin]
@rdblue is the patch being re-submitted after comments in github. Is this being fixed elsewhere that I do not know of?
Viraj
There's currenly no support for conversion to/from Pig datetimes
Reporter: Christian Rolf / @ccrolf
Related issues:
Original Issue Attachments:
Note: This issue was originally created as PARQUET-137. Please see the migration documentation for further details.