Open theianrobertson opened 6 years ago
For example 2, I believe it could either error with duplicate column definition (probably the safest, easiest and most consistent with Teradata behavior way of solving it) or it could automatically rename duplicate field names for all to_dict()
results should they overlap. While the latter might be a nice-to-have, I think it may introduce more problems than it solves, where just implementing what Teradata already does in the form of a duplicate definition of field error seems to be what I would favor.
Example 1 is a bit harder to determine the best course of action. Title has always been preferred to imitate how Teradata's various tools will prefer using the explicitly set name of a column rather than the original name. I wasn't aware one could define a table with the same title for multiple columns, so that definitely complicates things. Is this a case you run into a lot (or ever) where the title is necessary to be the same? And if so, could you provide any context on why one might need this? What was the behavior you would assume would take place in this case, and what would be the behavior you think you would want?
I think raising an exception would work in both cases. The silence is what makes this a bug to me. I can't think of a case where I couldn't just add an alias in the SQL to solve for the exception and re-execute, but without an exception there could be unintended downstream impacts.
(to your point, example 1 is not something that I'd come across a lot - but it's where I first came across it when running a select a, b from table
where a and b both had the same title and I didn't have control over the table definition.)
I think for the example 2 that might be pretty easy to implement (pull requests welcome!). With example 1, is the expectation that if there is a pre-existing table that has been defined in this manner that every time it is queried it will raise the exception should the aliases not be set for duplicates?
This is a corner case and I'm open to a "won't fix", but came across this and didn't expect the behaviour today: two examples where selecting columns that render out as having the same name will silently squash one of them in output:
Example 1 - Titles:
This code results in
[{'number': 2.0}]
, becauseColumn.name
usesself.title
overself.original_name
(here). Note, if you BulkExport out the data, the export.to_str() still works fine, but the to_dict() will overwrite (because it's a dict)Example 2 - duplicate column names in multiple tables:
This code results in
{'fld_a': 1}
.If you select the same field name in one query/table (e.g.
SELECT 1 AS field_1, 2 AS field_1
), Teradata will error out with 3863: Duplicate definition of field.Wondering if giraffez should exception out when there are duplicates in the list of column names? Or maybe just when calling
to_dict()
orto_list()
on Cmd or Export, which would produce unexpected results? (especially if you're exporting, the fields will still be inexport.columns.names
but would have fewer records into_dict() items
). Both of these could be fixed by the user providing an alias, but as a user I'd like giraffez to be "noisier" if there was going to be an issue.