Open parekuti opened 6 years ago
Hm interesting. When you say “thrift query” you mean Spark SQL right? Can we dump the filters that Spark is passing down and make sure Spark is actually giving us the right filters?
On Jan 8, 2018, at 1:37 PM, parekuti notifications@github.com wrote:
dse-spark1.4.8 branch
For example => select from loadtest.fdb_partition_test_chunks where "partition"=0x016101620163016401650166 and version=0; This query gives me data when i run it against C.
But when i translate this query as a thrift query then no data returned. select * from fdb_loadtest_partitiontest where field1='a' and field2='b' and field3='c' and field4='d' and field5='e' and field6='f'
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/filodb/FiloDB/issues/149, or mute the thread https://github.com/notifications/unsubscribe-auth/ABA324O_JtTdj4oNPrpaT7gG9_EAfg2pks5tIoqFgaJpZM4RW_RS.
Yes, it's spark SQL. I can see the filters passed down from the logs.
Also another thing i noticed when query table which has a partition key with 5 columns and one of the column is timestamp type then getting CastException. This is not happening in the case of table with partition key of 4 columns. select * from reading_5keys where corp_cd='X' and cli_no='X' and spin_asset_id=X and reading_date='2017-05-23 00:00:00' and reading_day_slot=6 ERROR o.a.s.s.h.t.SparkExecuteStatementOperation - Error executing query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost task 0.3 in stage 20.0 (TID 62, 127.0.0.1): java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Timestamp at filodb.core.SingleKeyTypes$TimestampKeyType$.toBytes(KeyType.scala:208)
I think the ClassCastException is a valid bug. That is, I think we are not translating the date string properly to Timestamp. This should not be that hard to track down and debug.
“Never doubt that a small group of thoughtful, committed citizens can change the world.” —M. Mead
On Jan 9, 2018, at 8:41 AM, parekuti notifications@github.com wrote:
Also another thing i noticed when query table which has a partition key with 5 columns and one of the column is timestamp type then getting CastException. This is not happening in the case of table with partition key of 4 columns. select * from reading_5keys where corp_cd='X' and cli_no='X' and spin_asset_id=X and reading_date='2017-05-23 00:00:00' and reading_day_slot=6 ERROR o.a.s.s.h.t.SparkExecuteStatementOperation - Error executing query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost task 0.3 in stage 20.0 (TID 62, 127.0.0.1): java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Timestamp at filodb.core.SingleKeyTypes$TimestampKeyType$.toBytes(KeyType.scala:208)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/filodb/FiloDB/issues/149#issuecomment-356341002, or mute the thread https://github.com/notifications/unsubscribe-auth/ABA32_c4N1KcYOAZn0S_sNUHQm_fXC1Kks5tI5bWgaJpZM4RW_RS.
It's an issue with the order of filters that we send it to scan a partition. For example partition key defined in the order -> corp_cd,cli_no,vendor_code,spin_asset_id,reading_day_slot FA,6210,NWF,3453352,07 -> Incoming filter order 6210,FA,NWF,3453352,07 -> outcome of the parseFilters function. This order is maintained and send to scan method which generates a wrong Hex key to fetch data. We need to fix this one in order to properly read that partition.
See below logs for more details.
[2018-01-10 10:11:10,417] INFO filodb.spark.FiloRelation$ - Incoming filters = List(EqualTo(corp_cd,FA), EqualTo(cli_no,6210), EqualTo(vendor_code,NWF), EqualTo(reading_day_slot,6), EqualTo(spin_asset_id,5555006)) [2018-01-10 10:11:10,420] INFO filodb.spark.FiloRelation$ - Incoming filters collect = List((corp_cd,EqualTo(corp_cd,FA)), (cli_no,EqualTo(cli_no,6210)), (vendor_code,EqualTo(vendor_code,NWF)), (reading_day_slot,EqualTo(reading_day_slot,6)), (spin_asset_id,EqualTo(spin_asset_id,5555006))) [2018-01-10 10:11:10,429] INFO filodb.spark.FiloRelation - Incoming filters order after parsing: Map(cli_no -> List(EqualTo(cli_no,6210)), corp_cd -> List(EqualTo(corp_cd,FA)), vendor_code -> List(EqualTo(vendor_code,NWF)), reading_day_slot -> List(EqualTo(reading_day_slot,6)), spin_asset_id -> List(EqualTo(spin_asset_id,5555006))) [2018-01-10 10:11:10,440] INFO filodb.spark.FiloRelation$ - Pushing down partition column cli_no, filters List(EqualTo(cli_no,6210)) [2018-01-10 10:11:10,440] INFO filodb.spark.FiloRelation$ - Pushing down partition column corp_cd, filters List(EqualTo(corp_cd,FA)) [2018-01-10 10:11:10,440] INFO filodb.spark.FiloRelation$ - Pushing down partition column vendor_code, filters List(EqualTo(vendor_code,NWF)) [2018-01-10 10:11:10,440] INFO filodb.spark.FiloRelation$ - Pushing down partition column reading_day_slot, filters List(EqualTo(reading_day_slot,6)) [2018-01-10 10:11:10,440] INFO filodb.spark.FiloRelation$ - Pushing down partition column spin_asset_id, filters List(EqualTo(spin_asset_id,5555006)) [2018-01-10 10:11:10,450] INFO filodb.spark.FiloRelation$ - Push down partition predicates: List(Set(6210), Set(FA), Set(NWF), Set(6), Set(5555006))
dse-spark1.4.8 branch
For example => select from loadtest.fdb_partition_test_chunks where "partition"=0x016101620163016401650166 and version=0; This query gives me data when i run it against C.
But when i translate this query as a thrift query then no data returned. select * from fdb_loadtest_partitiontest where field1='a' and field2='b' and field3='c' and field4='d' and field5='e' and field6='f'