Open opustecnica opened 1 year ago
not yet, this is a limitation with the custom parser sqlite-xsv
has that tries to parse column names + definitions on that case. It splits by word and doesn't take into account spaces/[]
brackets/""
quotes, which the SQLite core library does.
But it should! Like you mentioned there are workaround if you use filename=
, but that's not always possible. Will look into seeing how much effort it'll be to update that naive parser.
Alex,
To the best of my understanding, columns with (unfortunate) spaces can be handled in SQLite in 3 ways: .1 'Item Type' .2 "Item Type" .3 [Item Type] (this is my preferred way)
When creating a VT like:
CREATE VIRTUAL TABLE IF NOT EXISTS temp.xsv_5m_Sales_Records_Test USING csv(filename='D:\Temp\5m-Sales-Records.csv.zst', header=false,Region TEXT, Country TEXT, [Item Type] TEXT
it fails. When replacing "[]" with single quotes, the creation goes through but the results are:
Region Country Item
------ ------- ----
Europe Poland Beverages
Europe Poland Clothes
Using alternate delimiters like '`' (backtick) allows the table creation to complete but the column will assume the generic name of 'Column1'
Region Country Column1
------ ------- -------
Europe Poland Beverages
Europe Poland Clothes
Europe Poland Cosmetics
Europe Poland Snacks
Alex, if I understand correctly the code despite my turkey induced drowsiness, the column space issue should be resolved in the sqlite-loadable-rs (vtab_argparse.rs). This is an attempt. Let me know your thoughts.
pub fn arg_is_column_declaration(arg: &str) -> Result<Option<ColumnDeclaration>, String> {
if arg.trim().is_empty() {
return Ok(None);
}
// TODO this is a bit of a hack, but it works for now
// NOTE Test if 'arg' contains characters that imply the column name has a space.
let regex_check = Regex::new(r#"[\['"]"#).unwrap();
// NOTE Assign an appropriate regex string based on the above check.
let regex_string = if regex_check.is_match(arg) {
r#"^[\['"](?<name>.*?)[\]'"]\s+(?<declared_type>.*?)\s+(?<constraints>.*)$"#
} else {
r#"^(?<name>.*?)\s+(?<declared_type>.*?)\s+(?<constraints>.*)$"#
};
// NOTE Compile the regex string into a regex object.
let regex_result = Regex::new(regex_string);
// NOTE Check if the regex compiled successfully and assign the results to variables.
match regex_result {
Ok(re) => match re.captures(arg) {
Some(caps) => {
/*
// TEST
println!(
"Found match: Name: {0} - Declared Type: {1} - Constraints: {2}",
&caps["name"].trim(),
&caps["declared_type"].trim(),
&caps["constraints"].trim()
);
*/
// TODO Verify scope.
name = &caps["name"].trim();
declared_type = &caps["declared_type"].trim();
constraints = &caps["constraints"].trim();
}
None => {
println!("No match found!");
}
},
Err(err) => {
println!("Could not compile regex: {}", err);
}
}
Ok(Some(ColumnDeclaration::new(
name,
declared_type,
constraints,
)))
}
Happy TG!
While in agreement that spaces should never be used in a column name, at times, when dealing with externally sourced csvs, spaces are indeed present.
When attempting to create a virtual table that includes spaces, a 'vtable constructor failed: iavm_reader' error is returned.
e.g.
CREATE VIRTUAL TABLE temp.iavm_reader using csv_reader(plugin, [plugin name])
Is there a way to work around this?
PS When creating a virtual table directly there are no issues with the header.
e.g.
CREATE VIRTUAL TABLE temp.iavm_csv using csv(filename=iavm.csv)