aliyun / aliyun-odps-console

ODPS Console Source Code.
http://www.aliyun.com/product/odps
92 stars 26 forks source link

Strange ParseException: format error, DATETIME Unparseable date. #8

Open ktopcuoglu opened 5 years ago

ktopcuoglu commented 5 years ago

Hi there, I can't load 20110328035927 value to either datetime or timestamp column. (with data format pattern: "yyyyMMddHHmmss").

both tried odps.sql.type.system.odps2=true and false.

Also when i change input value, insert succeeded but query result returns input date with plus 6hours.. 20110328000000 >>>> 2011-03-28 06:00:00

Reproduce steps:

openjdk version "1.8.0_201" OpenJDK Runtime Environment (build 1.8.0_201-b09) OpenJDK 64-Bit Server VM (build 25.201-b09, mixed mode)

Aliyun ODPS Command Line Tool Version 0.29.1

echo "20110328035927" > import.csv
odps@ **my_account**>set odps.sql.type.system.odps2=false;

OK
odps@ **my_account**>tunnel upload -dfp 'yyyyMMddHHmmss' ~/import.csv tbl_datetime;
Upload session: 201903142005421875c00b0242854d
Start upload:/home/username/import.csv
Using \n to split records
Upload in strict schema mode: true
Total bytes:15   Split input to 1 blocks
2019-03-14 15:05:42 scan block: '1'
java.util.concurrent.ExecutionException: org.apache.commons.cli.ParseException: ERROR: format error - :1, DATETIME:'20110328035927'  Unparseable date: "20110328035927"content: 20110328035927
offset: 1

    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at com.aliyun.odps.ship.upload.DshipUpload.uploadBlock(DshipUpload.java:224)
    at com.aliyun.odps.ship.upload.DshipUpload.upload(DshipUpload.java:166)
    at com.aliyun.odps.ship.DShip.runSubCommand(DShip.java:71)
    at com.aliyun.odps.ship.DShipCommand.run(DShipCommand.java:99)
    at com.aliyun.openservices.odps.console.commands.InteractiveCommand.run(InteractiveCommand.java:151)
    at com.aliyun.openservices.odps.console.commands.CompositeCommand.run(CompositeCommand.java:50)
    at com.aliyun.openservices.odps.console.ODPSConsole.main(ODPSConsole.java:63)
Caused by: org.apache.commons.cli.ParseException: ERROR: format error - :1, DATETIME:'20110328035927'  Unparseable date: "20110328035927"content: 20110328035927
offset: 1

    at com.aliyun.odps.ship.upload.BlockUploader.doUpload(BlockUploader.java:166)
    at com.aliyun.odps.ship.upload.BlockUploader.upload(BlockUploader.java:98)
    at com.aliyun.odps.ship.upload.DshipUpload$1.call(DshipUpload.java:211)
    at com.aliyun.odps.ship.upload.DshipUpload$1.call(DshipUpload.java:208)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
ERROR: TunnelException - ErrorCode=Local Error, ErrorMessage=Block ID:0 Failed.
FAILED: error occurred while running tunnel command
>odps@ **my_account**>tunnel upload -dfp 'yyyyMMddHHmmss' ~/import.csv tbl_timestamp;

Upload session: 201903142006230475c00b02422511
Start upload:/home/username/import.csv
Using \n to split records
Upload in strict schema mode: true
Total bytes:15   Split input to 1 blocks
2019-03-14 15:06:23 scan block: '1'
java.util.concurrent.ExecutionException: org.apache.commons.cli.ParseException: ERROR: format error - :1, TIMESTAMP:'20110328035927'  Unparseable date: "20110328035927"content: 20110328035927
offset: 1

    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at com.aliyun.odps.ship.upload.DshipUpload.uploadBlock(DshipUpload.java:224)
    at com.aliyun.odps.ship.upload.DshipUpload.upload(DshipUpload.java:166)
    at com.aliyun.odps.ship.DShip.runSubCommand(DShip.java:71)
    at com.aliyun.odps.ship.DShipCommand.run(DShipCommand.java:99)
    at com.aliyun.openservices.odps.console.commands.InteractiveCommand.run(InteractiveCommand.java:151)
    at com.aliyun.openservices.odps.console.commands.CompositeCommand.run(CompositeCommand.java:50)
    at com.aliyun.openservices.odps.console.ODPSConsole.main(ODPSConsole.java:63)
Caused by: org.apache.commons.cli.ParseException: ERROR: format error - :1, TIMESTAMP:'20110328035927'  Unparseable date: "20110328035927"content: 20110328035927
offset: 1

    at com.aliyun.odps.ship.upload.BlockUploader.doUpload(BlockUploader.java:166)
    at com.aliyun.odps.ship.upload.BlockUploader.upload(BlockUploader.java:98)
    at com.aliyun.odps.ship.upload.DshipUpload$1.call(DshipUpload.java:211)
    at com.aliyun.odps.ship.upload.DshipUpload$1.call(DshipUpload.java:208)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
ERROR: TunnelException - ErrorCode=Local Error, ErrorMessage=Block ID:0 Failed.
FAILED: error occurred while running tunnel command
**odps@ **my_account**>set odps.sql.type.system.odps2=true;**
OK
**odps@ **my_account**>tunnel upload -dfp 'yyyyMMddHHmmss' ~/import.csv tbl_timestamp;**

Upload session: 201903142007161875c00b0242875e
Start upload:/home/username/import.csv
Using \n to split records
Upload in strict schema mode: true
Total bytes:15   Split input to 1 blocks
2019-03-14 15:07:16 scan block: '1'
java.util.concurrent.ExecutionException: org.apache.commons.cli.ParseException: ERROR: format error - :1, TIMESTAMP:'20110328035927'  Unparseable date: "20110328035927"content: 20110328035927
offset: 1

    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at com.aliyun.odps.ship.upload.DshipUpload.uploadBlock(DshipUpload.java:224)
    at com.aliyun.odps.ship.upload.DshipUpload.upload(DshipUpload.java:166)
    at com.aliyun.odps.ship.DShip.runSubCommand(DShip.java:71)
    at com.aliyun.odps.ship.DShipCommand.run(DShipCommand.java:99)
    at com.aliyun.openservices.odps.console.commands.InteractiveCommand.run(InteractiveCommand.java:151)
    at com.aliyun.openservices.odps.console.commands.CompositeCommand.run(CompositeCommand.java:50)
    at com.aliyun.openservices.odps.console.ODPSConsole.main(ODPSConsole.java:63)
Caused by: org.apache.commons.cli.ParseException: ERROR: format error - :1, TIMESTAMP:'20110328035927'  Unparseable date: "20110328035927"content: 20110328035927
offset: 1

    at com.aliyun.odps.ship.upload.BlockUploader.doUpload(BlockUploader.java:166)
    at com.aliyun.odps.ship.upload.BlockUploader.upload(BlockUploader.java:98)
    at com.aliyun.odps.ship.upload.DshipUpload$1.call(DshipUpload.java:211)
    at com.aliyun.odps.ship.upload.DshipUpload$1.call(DshipUpload.java:208)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
ERROR: TunnelException - ErrorCode=Local Error, ErrorMessage=Block ID:0 Failed.
FAILED: error occurred while running tunnel command
**>odps@ **my_account**>tunnel upload -dfp 'yyyyMMddHHmmss' ~/import.csv tbl_datetime;**

Upload session: 201903142007230475c00b02422665
Start upload:/home/username/import.csv
Using \n to split records
Upload in strict schema mode: true
Total bytes:15   Split input to 1 blocks
2019-03-14 15:07:23 scan block: '1'
java.util.concurrent.ExecutionException: org.apache.commons.cli.ParseException: ERROR: format error - :1, DATETIME:'20110328035927'  Unparseable date: "20110328035927"content: 20110328035927
offset: 1

    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at com.aliyun.odps.ship.upload.DshipUpload.uploadBlock(DshipUpload.java:224)
    at com.aliyun.odps.ship.upload.DshipUpload.upload(DshipUpload.java:166)
    at com.aliyun.odps.ship.DShip.runSubCommand(DShip.java:71)
    at com.aliyun.odps.ship.DShipCommand.run(DShipCommand.java:99)
    at com.aliyun.openservices.odps.console.commands.InteractiveCommand.run(InteractiveCommand.java:151)
    at com.aliyun.openservices.odps.console.commands.CompositeCommand.run(CompositeCommand.java:50)
    at com.aliyun.openservices.odps.console.ODPSConsole.main(ODPSConsole.java:63)
Caused by: org.apache.commons.cli.ParseException: ERROR: format error - :1, DATETIME:'20110328035927'  Unparseable date: "20110328035927"content: 20110328035927
offset: 1

    at com.aliyun.odps.ship.upload.BlockUploader.doUpload(BlockUploader.java:166)
    at com.aliyun.odps.ship.upload.BlockUploader.upload(BlockUploader.java:98)
    at com.aliyun.odps.ship.upload.DshipUpload$1.call(DshipUpload.java:211)
    at com.aliyun.odps.ship.upload.DshipUpload$1.call(DshipUpload.java:208)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
ERROR: TunnelException - ErrorCode=Local Error, ErrorMessage=Block ID:0 Failed.
FAILED: error occurred while running tunnel command

2nd try with successful insert, but adding 6 hours..

>username@localhost:~ $ echo "20110328000000" > import.csv
>username@localhost:~ $ odpscmd
>Aliyun ODPS Command Line Tool Version 0.29.1
>@Copyright 2017 Alibaba Cloud Computing Co., Ltd. All rights reserved.
>odps@ **my_account**>tunnel upload -dfp 'yyyyMMddHHmmss' ~/import.csv tbl_datetime;

Upload session: 201903142010581875c00b02428c1d
Start upload:/home/username/import.csv
Using \n to split records
Upload in strict schema mode: true
Total bytes:15   Split input to 1 blocks
2019-03-14 15:10:58 scan block: '1'
2019-03-14 15:10:58 scan block complete, blockid=1
2019-03-14 15:10:58 upload block: '1'
2019-03-14 15:10:58 upload block complete, blockid=1
OK
>odps@ **my_account**>tunnel upload -dfp 'yyyyMMddHHmmss' ~/import.csv tbl_timestamp;

Upload session: 201903142011050475c00b02422b3a
Start upload:/home/username/import.csv
Using \n to split records
Upload in strict schema mode: true
Total bytes:15   Split input to 1 blocks
2019-03-14 15:11:05 scan block: '1'
2019-03-14 15:11:05 scan block complete, blockid=1
2019-03-14 15:11:05 upload block: '1'
2019-03-14 15:11:05 upload block complete, blockid=1
OK
**odps@ **my_account**>select * from tbl_datetime;**

ID = 20190314121122715gl3h4dj
Log view:
http://logview.odps.aliyun.com/logview/?h=http://service.eu-central-1.maxcompute.aliyun.com/....
Job Queueing.
+------------+
| col        |
+------------+
| 2011-03-28 06:00:00 |
+------------+
**odps@ my_account>select * from tbl_timestamp;**

ID = 20190314121133301g93nrcj
Log view:
http://logview.odps.aliyun.com/logview/?h=http://service.eu-central-1.maxcompute.aliyun.com/....
Job Queueing.
+------+
| col  |
+------+
| 2011-03-28 06:00:00 |
+------+
guozhenhong commented 5 years ago

20110328035927

Sorry I followed the steps you said, but did not reproduce this error. I successfully uploaded the data, the data is also correct.

I think the key point of this case may be the difference in time zone. What was the time zone when you running odpscmd?

ktopcuoglu commented 5 years ago

Thank you for your answer. I agreed, think the same thing with you, because it is ok with some of the other time values. But I couldn't figure it out how it is possible.

Here some details about os, with timezone. Also, i am working with EU Central 1 region with maxcompute.

kursat:~ $ date
Fri Mar 15 10:10:35 +03 2019
kursat:~ $ timedatectl
      Local time: Fri 2019-03-15 10:11:02 +03
  Universal time: Fri 2019-03-15 07:11:02 UTC
        RTC time: Fri 2019-03-15 07:11:02
       Time zone: Europe/Istanbul (+03, +0300)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: n/a

kursat:~ $ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
kursat:~ $ cat .odpscmd/odps_config.ini
project_name=xxxxxxxxxx
access_id=yyyyyyyyyy
access_key=zzzzzzzzzz
end_point=http://service.eu-central-1.maxcompute.aliyun.com/api
tunnel_endpoint=http://dt.eu-central-1.maxcompute.aliyun.com
https_check=true
ktopcuoglu commented 5 years ago

also regarding the post: December 24, 2018: MaxCompute supports time zone configuration.

I tried following configs but nothing changed.

:/$ odpscmd
Aliyun ODPS Command Line Tool
Version 0.29.1
@Copyright 2017 Alibaba Cloud Computing Co., Ltd. All rights reserved.
odps@ poc_trendyol2019>select getDate();

ID = 20190315114043214gg7j4dj
Log view:
http://logview.odps.aliyun.com/logview/...
Job Queueing.

+------------+
| _c0        |
+------------+
| 2019-03-15 19:40:43 |
+------------+

odps@ poc_trendyol2019>set odps.sql.timezone=Asia/Tokyo;
OK

odps@ poc_trendyol2019>select getDate();

ID = 20190315114210992gm8j4dj
Log view:
http://logview.odps.aliyun.com/logview/...
Job Queueing.

+------------+
| _c0        |
+------------+
| 2019-03-15 19:42:11 |
+------------+
odps@ poc_trendyol2019>set odps2.sql.timezone=Asia/Tokyo;
OK
odps@ poc_trendyol2019>select getDate();

ID = 20190315114229546gzkprcj
Log view:
http://logview.odps.aliyun.com/logview/...
Job Queueing.

+------------+
| _c0        |
+------------+
| 2019-03-15 19:42:29 |
+------------+

odps@ poc_trendyol2019>set odps.sql.timezone=Europe/London;
OK

odps@ poc_trendyol2019>select getDate();

ID = 20190315114320355gl9j4dj
Log view:
http://logview.odps.aliyun.com/logview/...
Job Queueing.

+------------+
| _c0        |
+------------+
| 2019-03-15 19:43:20 |
+------------+
guozhenhong commented 5 years ago

Europe/Istanbul

I used Europe/Istanbul to upload 20110328035927, and the error occurred as you described.

Aliyun ODPS Command Line Tool
Version 0.29.1
@Copyright 2017 Alibaba Cloud Computing Co., Ltd. All rights reserved.
odps@ test_project>tunnel upload -dfp 'yyyyMMddHHmmss' import.csv test_date_time -tz "Europe/Istanbul";
Upload session: 201903161438498fde650a00004b93
Start upload:import.csv
Using \n to split records
Upload in strict schema mode: true
Total bytes:15   Split input to 1 blocks
2019-03-16 14:38:53 scan block: '1'
java.util.concurrent.ExecutionException: org.apache.commons.cli.ParseException: ERROR: format error - :1, DATETIME:'20110328035927'  Unparseable date: "20110328035927"content: 20110328035927
... ...

I have checked some materials, and it seems that Turkey will switch to daylight saving time on the last Sunday of march every year, so I suspect that this error is related to daylight saving time, resulting in the exception of java.util.date.

guozhenhong commented 5 years ago

regarding

I'm sorry that this setting was not working. I guess it's because the service of eu-central-1 hasn't been upgraded yet. I will later contact our operation engineer to confirm this problem.

lyman commented 5 years ago

2011-03-28 03:59:27 is indeed an invalid date for timezone Europe/Istanbul

$ export LANG=c
$ TZ=Europe/Istanbul date -d "2011-03-28 02:59:27"
Mon Mar 28 02:59:27 EET 2011
$ TZ=Europe/Istanbul date -d "2011-03-28 03:00:27"
date: invalid date '2011-03-28 03:00:27'
$ TZ=Europe/Istanbul date -d "2011-03-28 03:59:27"
date: invalid date '2011-03-28 03:59:27'
$ TZ=Europe/Istanbul date -d "2011-03-28 04:00:27"
Mon Mar 28 04:00:27 EEST 2011

due to day saving time, the whole hour of 03:00 does not exists at all

$ zdump -i Europe/Istanbul
...
2011-03-28      04      +03     EEST    1
2011-10-30      03      +02     EET
...
ktopcuoglu commented 5 years ago

@guozhenhong

I have checked some materials, and it seems that Turkey will switch to daylight saving time on the last Sunday of march every year, so I suspect that this error is related to daylight saving time, resulting in the exception of java.util.date.

In fact, at 2016 Turkish council have decided to left DST adjusting, since 2 years Turkey permanently using UTC+3 timezone. Also, as I know some of the other Europian countries also will leave DST too.

related news turkey related news eu

I'm sorry that this setting was not working. I guess it's because the service of eu-central-1 hasn't been upgraded yet. I will later contact our operation engineer to confirm this problem.

It will be fine, thank you.

ktopcuoglu commented 5 years ago

@lyman

2011-03-28 03:59:27 is indeed an invalid date for timezone Europe/Istanbul

I'm confused but yes, this timestamp really invalid because of daylight saving...

Still, I am thinking of the differences between DateTime and TimeStamp data types.

DATETIME: Date-time type, range from December 31, 999 to January 1-9, 0000 (typo?), exact to milliseconds

TIMESTAMP: It depends on the time zone and ranges from January 1st 0000 to December 31, 9999 23.59:59.999999999, and is accurate to nanosecond-level.

Source: Data types

I expect DateTime column doesn't respect which timezone using we are, unlike the TimeStamp. Am I wrong?

lyman commented 5 years ago

@ktopcuoglu Datetime and Timestamp are both treated as UTC timestamp in MaxCompute, the only difference is precision.