When using the BigQueryMetadataExtractor to extract tables schema from BigQuery the values of the ColumnMetadata.sort_order attribute should reflect the ordinal position of the column in BigQuery, i.e. the ordinal_position of the column as reported by the <data_set_name>.INFORMATION_SCHEMA.COLUMN table. Such that the column with ordinal_position=1 should get sort_order=1, the column with ordinal_position=2 should get sort_order=2, the column with ordinal_position=3 should get sort_order=3, or more general the column at index i gets sort_order=i.
Current Behavior
While the order of columns seems to be correct, the values in ColumnMetadata.sort_order seem to be inaccurate and do not match the ordinal_position of the column as specified in the information schema table.
The ColumnMetadata.sort_order seems to be getting only odd numbers, such that the column with ordinal_position=1 gets sort_order=1, the column with ordinal_position=2 gets sort_order=3, the column with ordinal_position=3 gets sort_order=5, or generally a column with oridinal_position=i gets sort_order=(i*2 - 1).
Possible Solution
When calling the _iterate_over_cols method, the total_cols parameter should be populated with the real number of columns processed so far. For example, in this line pass total_cols as is instead of total_cols + 1.
And inside the _iterate_over_cols, when creating the ColumnMetadata instance the sort_order should be set to total_cols+1 and match the return value of the function.
Your Environment
Amunsen version used: amundesen-databuilder version 7.4.4
Expected Behavior
When using the
BigQueryMetadataExtractor
to extract tables schema from BigQuery the values of theColumnMetadata.sort_order
attribute should reflect the ordinal position of the column in BigQuery, i.e. theordinal_position
of the column as reported by the<data_set_name>.INFORMATION_SCHEMA.COLUMN
table. Such that the column withordinal_position=1
should getsort_order=1
, the column withordinal_position=2
should getsort_order=2
, the column withordinal_position=3
should getsort_order=3
, or more general the column at indexi
getssort_order=i
.Current Behavior
While the order of columns seems to be correct, the values in
ColumnMetadata.sort_order
seem to be inaccurate and do not match theordinal_position
of the column as specified in the information schema table.The
ColumnMetadata.sort_order
seems to be getting only odd numbers, such that the column withordinal_position=1
getssort_order=1
, the column withordinal_position=2
getssort_order=3
, the column withordinal_position=3
getssort_order=5
, or generally a column withoridinal_position=i
getssort_order=(i*2 - 1)
.Possible Solution
When calling the
_iterate_over_cols
method, thetotal_cols
parameter should be populated with the real number of columns processed so far. For example, in this line passtotal_cols
as is instead oftotal_cols + 1
.And inside the
_iterate_over_cols
, when creating theColumnMetadata
instance thesort_order
should be set tototal_cols+1
and match the return value of the function.Your Environment
amundesen-databuilder
version 7.4.4