databendlabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.71k stars 732 forks source link

bug: stage created with quoted name can not be corretly refereced with @<stage_name> #16521

Closed youngsofun closed 21 hours ago

youngsofun commented 1 day ago

Summary

create stage, like create table, accept any valid ident, when there are special characters, quote is used. however, @<stage_name> use escaping instead of quoting for '\s' etc, this does not make sense.

    #[regex(r#"@([^\s`;'"()]|\\\s|\\'|\\"|\\\\)+"#)]
     LiteralAtString,

This bug has existed for a long time, and almost no one has complained about it, so there is reason to believe that a simple stage name (plain ident) will be sufficient.

   #[regex(r#"[_a-zA-Z][_$a-zA-Z0-9]*"#)]
    Ident,
youngsofun commented 1 day ago

cc @flaneur2020 @everpcpc @wubx @Xuanwo

dosubot[bot] commented 1 day ago

Hey there, @youngsofun! 👋 I'm here to help you out with any bugs, questions, or contributions while you wait for a human maintainer. I'm a bot designed to assist you in any way I can.

To fix the bug where a stage created with a quoted name cannot be correctly referenced with @<stage_name> due to the use of escaping instead of quoting for special characters, you need to ensure that the stage name is properly quoted when it is referenced. This can be done by modifying the parsing logic to handle quoted identifiers correctly.

Here is a potential fix in the create_stage parsing logic:

let create_stage = map_res(
    rule! {
        CREATE ~ ( OR ~ ^REPLACE )? ~ STAGE ~ ( IF ~ ^NOT ~ ^EXISTS )?
        ~ ( #quoted_stage_name )
        ~ ( (URL ~ ^"=")? ~ #uri_location )?
        ~ ( #file_format_clause )?
        ~ ( ON_ERROR ~ ^"=" ~ ^#ident )?
        ~ ( SIZE_LIMIT ~ ^"=" ~ ^#literal_u64 )?
        ~ ( VALIDATION_MODE ~ ^"=" ~ ^#ident )?
        ~ ( (COMMENT | COMMENTS) ~ ^"=" ~ ^#literal_string )?
    },
    |(
        _,
        opt_or_replace,
        _,
        opt_if_not_exists,
        stage,
        url_opt,
        file_format_opt,
        on_error_opt,
        size_limit_opt,
        validation_mode_opt,
        comment_opt,
    )| {
        let create_option =
            parse_create_option(opt_or_replace.is_some(), opt_if_not_exists.is_some())?;
        Ok(Statement::CreateStage(CreateStageStmt {
            create_option,
            stage_name: stage.to_string(),
            location: url_opt.map(|(_, location)| location),
            file_format_options: file_format_opt.unwrap_or_default(),
            on_error: on_error_opt.map(|v| v.2.to_string()).unwrap_or_default(),
            size_limit: size_limit_opt.map(|v| v.2 as usize).unwrap_or_default(),
            validation_mode: validation_mode_opt
                .map(|v| v.2.to_string())
                .unwrap_or_default(),
            comments: comment_opt.map(|v| v.2).unwrap_or_default(),
        }))
    },
);

In this fix, #quoted_stage_name should be a rule that correctly parses quoted identifiers, ensuring that special characters are handled properly. This way, when the stage name is referenced with @<stage_name>, it will be correctly interpreted.

Additionally, ensure that the resolve_stage_location function correctly handles quoted stage names:

#[async_backtrace::framed]
pub async fn resolve_stage_location(
    ctx: &dyn TableContext,
    location: &str,
) -> Result<(StageInfo, String)> {
    let names: Vec<&str> = location.splitn(2, '/').filter(|v| !v.is_empty()).collect();

    let stage = if names[0] == "~" {
        StageInfo::new_user_stage(&ctx.get_current_user()?.name)
    } else {
        UserApiProvider::instance()
            .get_stage(&ctx.get_tenant(), names[0])
            .await?
    };

    let path = names.get(1).unwrap_or(&"").trim_start_matches('/');
    let path = if path.is_empty() { "/" } else { path };

    debug!("parsed stage: {stage:?}, path: {path}");
    Ok((stage, path.to_string()))
}

This ensures that the stage name is correctly parsed and resolved, even if it contains special characters.

Additionally, the StageInfo struct and related methods should be updated to handle quoted stage names properly:

/// Create a new user stage.
pub fn new_user_stage(user: &str) -> StageInfo {
    StageInfo {
        stage_name: user.to_string(),
        stage_type: StageType::User,
        ..Default::default()
    }
}

/// Update user stage with stage name.
pub fn with_stage_name(mut self, name: &str) -> StageInfo {
    self.stage_name = name.to_string();
    self
}

/// Get the prefix of stage.
///
/// Use this function to get the prefix of this stage in the data operator.
///
/// # Notes
///
/// This function should never be called on external stage because it's meaningless. Something must be wrong.
pub fn stage_prefix(&self) -> String {
    match self.stage_type {
        StageType::LegacyInternal => format!("/stage/{}/", self.stage_name),
        StageType::External => {
            unreachable!("stage_prefix should never be called on external stage, must be a bug")
        }
        StageType::Internal => format!("/stage/internal/{}/", self.stage_name),
        StageType::User => format!("/stage/user/{}/", self.stage_name),
    }
}

By ensuring that the stage name is correctly quoted and parsed at all stages of its lifecycle, you can avoid issues related to special characters in stage names.

Additionally, the ident_needs_quote and display_ident functions from quote.rs should be used to determine if a stage name needs quoting and to display it correctly:

pub fn ident_needs_quote(ident: &str) -> bool {
    if ident.is_empty() {
        return true;
    }

    // avoid quote the special identifier "~" which is an available stage name
    if ident == "~" {
        return false;
    }

    let mut chars = ident.chars();
    let first = chars.next().unwrap();
    if !first.is_ascii_alphabetic() && first != '_' {
        return true;
    }

    for c in chars {
        if !c.is_ascii_alphanumeric() && c != '_' && c != '$' {
            return true;
        }
    }

    false
}

pub fn display_ident(name: &str, quoted_ident_case_sensitive: bool, dialect: Dialect) -> String {
    // Db-s -> "Db-s" ; dbs -> dbs
    if name.chars().any(|c| c.is_ascii_uppercase()) && quoted_ident_case_sensitive
        || ident_needs_quote(name)
    {
        QuotedIdent(name, dialect.default_ident_quote()).to_string()
    } else {
        name.to_string()
    }
}

By integrating these functions, you ensure that stage names are correctly quoted and displayed, preventing issues with special characters [1][2][3][4].

To continue talking to Dosu, mention @dosu.