dadosfera / Bugsfera

Other
1 stars 0 forks source link

Catalog Process failing because the NaNs are not filtered from the catalogation process #13

Closed rafaelsantanaep closed 1 year ago

rafaelsantanaep commented 1 year ago

Mandatory information:

There are customers directly impacted by this bug. Which?

Bug Category

Describe the bug

While trying to catalog columns of type float,double precision if there are NaN in one of those columns, we are unable to catalog some of the columns in the dataset.

To Reproduce
Expected behavior

We filter out the NaN from the query

This bug impact any demo or a sales?

No

Dadosfera Customer:


 

Other informations:

Any logs, error output, etc?

Uncaught exception: [CatalogationError(context='SnowflakeColumnMetadata', error='400 Client Error: Bad Request for url: https://nimbus-cashu.dadosfera.ai/api/catalog/column-metadata/', body="{'min': '0', 'max': 'NaN', 'mean': nan, 'unique_count': 154, 'unique_rate': 0.014665269974288162, 'missing_count': 0, 'missing_rate': 0.0, 'stddev': nan, 'median': 12384.97, 'table_schema': 'PUBLIC', 'table_name': 'TB__F33I7O__OICO__RECEIVABLE', 'column_name': 'CLIENTE__LIMITE_CREDITO', 'data_type': 'FLOAT', 'analyzer_rules': ['isUnique', 'isComplete', 'hasMin', 'hasMax']}"), CatalogationError(context='SnowflakeColumnMetadata', error='400 Client Error: Bad Request for url: https://nimbus-cashu.dadosfera.ai/api/catalog/column-metadata/', body="{'min': '0', 'max': 'NaN', 'mean': nan, 'unique_count': 933, 'unique_rate': 0.08884868107799257, 'missing_count': 0, 'missing_rate': 0.0, 'stddev': nan, 'median': 0.0, 'table_schema': 'PUBLIC', 'table_name': 'TB__F33I7O__OICO__RECEIVABLE', 'column_name': 'TITULO__VALOR_JUROS_MULTA', 'data_type': 'FLOAT', 'analyzer_rules': ['isUnique', 'isComplete', 'hasMin', 'hasMax']}")]

What environment of software are you using?

When the bug happened: … It's happening every day as the pipeline runs

Steps to fix this issue

a. Fix the method _compute_statistics_for_column in the repository dadosfera-catalog-engine

b. Update the library dadosfera-catalog-engine in the following repositories:

samirleao commented 1 year ago

PR com a correção: https://github.com/dadosfera/dadosfera-catalog-engine-lib/pull/6

samirleao commented 1 year ago

Foi lançada a versão 1.1.0a7 do dadosfera-catalog-engine e já está atualizada em todos os conectores em ambiente beta.

rafaelsantanaep commented 1 year ago

@samirleao Na tentativa de corrigirmos esse bug, geramos outro bug hehehe.

rafaelsantanaep commented 1 year ago

Bug corrigido pelos PRS abaixo: https://github.com/dadosfera/s3-connector/pull/60 https://github.com/dadosfera/singer-connector/pull/263 https://github.com/dadosfera/dadosfera-catalog-engine-lib/pull/7 https://github.com/dadosfera/jdbc-connector/pull/131

samirleao commented 1 year ago

Confirmei com o Satana aqui e o fix já foi implementado e tá em prd.