databendlabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.71k stars 732 forks source link

Feature: Further Date Parsing Flexibility #16427

Open rad-pat opened 2 weeks ago

rad-pat commented 2 weeks ago

Further to Issue https://github.com/datafuselabs/databend/issues/15257, we still experience parsing dates when all tokens are not populated, for example:

select to_timestamp('2023-08-24', '%Y-%m-%d %H:%M:%S')
Query failed: QueryErrors{code=1006, message=BadArguments. Code: 1006, Text = premature end of input. while evaluating function `to_timestamp('2023-08-24', '%Y-%m-%d %H:%M:%S')` in expr `to_timestamp('2023-08-24', '%Y-%m-%d %H:%M:%S')`} cause: null

It would be great if this could output '2023-08-24' with the parse_datetime_ignore_remainder flag set

dosubot[bot] commented 2 weeks ago

I found a similar closed issue that might be relevant to your request: Feature: Add Flexible Date Parsing in to_timestamp for Better Conversion Support. This issue was addressed by implementing the parse_datetime_ignore_remainder flag, which allows the to_timestamp function to handle incomplete date strings without error [1].

The parse_datetime_ignore_remainder flag is designed to handle cases where the full timestamp is not supplied. When this flag is enabled, the function will fill in missing parts of the date with default values, allowing it to parse incomplete timestamps like '2023-08-24' without error. Here is the relevant part of the code:

if ctx.func_ctx.parse_datetime_ignore_remainder {
    let mut parsed = Parsed::new();
    if let Err(e) = parse_and_remainder(&mut parsed, timestamp, StrftimeItems::new(format)) {
        return Err(ErrorCode::BadArguments(format!("{}", e)));
    }
    // Additional checks and adjustments for parsed timestamp
    if parsed.timestamp.is_none() {
        if parsed.year.is_none() {
            parsed.year = Some(1970);
            parsed.year_div_100 = Some(19);
            parsed.year_mod_100 = Some(70);
        }
        if parsed.month.is_none() {
            parsed.month = Some(1);
        }
        if parsed.day.is_none() {
            parsed.day = Some(1);
        }
        if parsed.hour_div_12.is_none() && parsed.hour_mod_12.is_none() {
            parsed.hour_div_12 = Some(0);
            parsed.hour_mod_12 = Some(0);
        }
        if parsed.minute.is_none() {
            parsed.minute = Some(0);
        }
        if parsed.second.is_none() {
            parsed.second = Some(0);
        }
    }
    // Further processing...
}

To enable this flag, you can configure it as follows:

("parse_datetime_ignore_remainder", DefaultSettingValue {
    value: UserSettingValue::UInt64(1),
    desc: "Ignore trailing chars when parse string to datetime",
    mode: SettingMode::Both,
    range: Some(SettingRange::Numeric(0..=1)),
}),

You can set this flag to 1 (enabled) or 0 (disabled) using the following methods:

pub fn get_parse_datetime_ignore_remainder(&self) -> Result<bool> {
    Ok(self.try_get_u64("parse_datetime_ignore_remainder")? != 0)
}

pub fn set_parse_datetime_ignore_remainder(&self, val: u64) -> Result<()> {
    self.try_set_u64("parse_datetime_ignore_remainder", val)
}

This should allow the to_timestamp function to parse the date '2023-08-24' without error when the full timestamp is not supplied [2][3][4].

To continue talking to Dosu, mention @dosu.

TCeason commented 2 weeks ago

https://github.com/datafuselabs/databend/issues/15257#issuecomment-2062874370

Because of this comment it's not a good idea.