SQLServer Pipelines cannot handle `geography` data type.

Mandatory information:

There are customers directly impacted by this bug. Which?

basealpha, but, I was able to solve the issue by ignoring the source column.

Bug Category

[ ] Connection Manager
[ ] Connection Test
[ ] Creating Pipeline
[x] Executing Pipeline
[ ] Cataloging Data Asset
[ ] Other

Describe the bug

While trying to collect data from SQL Server, the pipeline failed because there was a type that Spark could not handle.

An error occurred while calling o42.load. : java.sql.SQLException: Unrecognized SQL type -158 at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:251) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:321) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:321) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:279) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:268) at scala.Option.getOrElse(Option.scala:189) at

How to replicate this issue:

Create a pipeline using the:
- database: AdventureWorks
- schema: Person
- table_name: Address
Run the Pipeline

Possible plan of action:

Ignore columns of type geography while creating the pipeline. Since the platform-api does not receive the data-types. I think this should be handled by Pi Factory or by the Connection Test.
Document this issue in the SQL Server connector documentation.

Does this bug impact any demos or sales?

Dadosfera Customer:

basealpha

Workaround

The customer itself can ignore the column while creating the pipeline.

What environment of software are you using?

[x] PRD
[ ] Other

When the bug happened: … 2022-12-16

dadosfera / Bugsfera

SQLServer Pipelines cannot handle `geography` data type. #41