databricks / spark-redshift

Redshift data source for Apache Spark
Apache License 2.0
606 stars 348 forks source link

Using CAST in query reports int to date problem #406

Open victorngi opened 6 years ago

victorngi commented 6 years ago

I was trying to query following: select * from test.table where cast(dt_ref AS DATE) <= cast('2016-10-01' AS DATE) AND cast(dt_ref AS DATE) >= cast('2016-06-01' AS DATE)

where datagrip ran without problem.

If I run it with spark-redshift, reading it does not have problem. However, when I tried to pivot this table:

Py4JJavaError: An error occurred while calling o6790.pivot. : java.sql.SQLException: Amazon Invalid operation: cannot cast type integer to date; at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source) at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(Unknown Source) at com.amazon.redshift.client.PGMessagingContext.handleMessage(Unknown Source) at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(Unknown Source) at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(Unknown Source) at com.amazon.redshift.client.PGMessagingContext.getErrorResponse(Unknown Source) at com.amazon.redshift.client.PGClient.handleErrorsScenario2ForPrepareExecution(Unknown Source) at com.amazon.redshift.client.PGClient.handleErrorsPrepareExecute(Unknown Source) at com.amazon.redshift.client.PGClient.executePreparedStatement(Unknown Source) at com.amazon.redshift.dataengine.PGQueryExecutor.executePreparedStatement(Unknown Source) at com.amazon.redshift.dataengine.PGQueryExecutor.execute(Unknown Source) at com.amazon.jdbc.common.SPreparedStatement.executeWithParams(Unknown Source) at com.amazon.jdbc.common.SPreparedStatement.execute(Unknown Source) at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:108) at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:108) at com.databricks.spark.redshift.JDBCWrapper$$anonfun$2.apply(RedshiftJDBCWrapper.scala:126) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) Caused by: com.amazon.support.exceptions.ErrorException: Amazon Invalid operation: cannot cast type integer to date; ... 20 more

Anyone met this problem before? It's not a problem of dt_ref column in redshift. All strings are date.

ChengzhiZhao commented 6 years ago

Not sure if this completely related, but I got the same error as yours. After looking into more on this, I noticed if you put single quotes, it would treat '-' as minus sign, 2016-10-01 would be 2005 which is an integer. so I have to put 3 quotes like '''2016-10-01''' instead and it seems solved the issue. Hope this helps.

kayelow commented 5 years ago

I had similar issues with any Redshift functions in creating databricks tables that take string literals (wrapped in single quotes) as arguments, such as CAST, DATE_ADD, DATE_DIFF, etc.
NB: creating the spark dataframe seems to work fine, it is when I try to create a temporary or permanent Databricks table it seems to hiccup. I believe this is because under the hood, the query is converted to an UNLOAD statement. I've had to do some weird escaping to get around this as ''' triple single quotes didn't always work.

As an example I have some code similar to the following: date_diff('day', created_at, updated_at) which did not work when triple escaped date_diff('''day''', created_at, updated_at) nor slash escaped date_diff(\'day\', created_at, updated_at) but DID work when using the following date_diff('''day\', created_at, updated_at).

sabarisan commented 5 years ago

Thanks @kayelow that solution works for me too.

jjin-ut commented 4 years ago

I have a similar problem. my query is select * from test.table where name = 'Jane Doe'. redshift-spark can't translate the single quote appropriately. the '''Jane Doe''', \'Jane Doe\' and '''Jane Doe\'does not work for me. Feel like I'm missing something obvious. Anyone has a clue?

robertisandor commented 4 years ago

@JianUserTesting I had a similar problem. For me, I had to use the triple single quote on only the beginning of the first string though. My query ended up being something similar to select col1, col2, "col3" as column_name from test.table where email in ('''first@email.com', 'second@email.com', 'third@email.com').