amundsen-io / amundsen

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
https://www.amundsen.io/amundsen/
Apache License 2.0
4.34k stars 950 forks source link

Bug Report: BigQuery columns `sort_order` attribute is incorrect #2225

Open dlahyani opened 6 months ago

dlahyani commented 6 months ago

Expected Behavior

When using the BigQueryMetadataExtractor to extract tables schema from BigQuery the values of the ColumnMetadata.sort_order attribute should reflect the ordinal position of the column in BigQuery, i.e. the ordinal_position of the column as reported by the <data_set_name>.INFORMATION_SCHEMA.COLUMN table. Such that the column with ordinal_position=1 should get sort_order=1, the column with ordinal_position=2 should get sort_order=2, the column with ordinal_position=3 should get sort_order=3, or more general the column at index i gets sort_order=i.

Current Behavior

While the order of columns seems to be correct, the values in ColumnMetadata.sort_order seem to be inaccurate and do not match the ordinal_position of the column as specified in the information schema table.

The ColumnMetadata.sort_order seems to be getting only odd numbers, such that the column with ordinal_position=1 gets sort_order=1, the column with ordinal_position=2 gets sort_order=3, the column with ordinal_position=3 gets sort_order=5, or generally a column with oridinal_position=i gets sort_order=(i*2 - 1).

Possible Solution

When calling the _iterate_over_cols method, the total_cols parameter should be populated with the real number of columns processed so far. For example, in this line pass total_cols as is instead of total_cols + 1.

And inside the _iterate_over_cols, when creating the ColumnMetadata instance the sort_order should be set to total_cols+1 and match the return value of the function.

Your Environment

boring-cyborg[bot] commented 6 months ago

Thanks for opening your first issue here!