Closed aucahuasi closed 4 years ago
Original, query:
If i have query like
SELECT part a, part b FROM scans_c AS a JOIN store_visit_c AS b ON a.store_nbr = b.store_nbr AND a.visit_nbr = b.visit_nbr but having the query like this.
SELECT part a FROM scans_c AS a JOIN store_visit_c AS b ON a.store_nbr = b.store_nbr AND a.visit_nbr = b.visit_nbr or
SELECT part b FROM scans_c AS a JOIN store_visit_c AS b ON a.store_nbr = b.store_nbr AND a.visit_nbr = b.visit_nbr works.
More clues, If I reduce the amount of data it is running fine. I think it is running out of memory calculating the following CTE (visit_scans_c ). Though the result DF doesn’t use much space on the GPU memory. I guess the blazing engine is trying to evaluate all the columns in parallel and running into out of memory issues.
Hi @mlahir1 I moved the issue from the deprecated RAL repo to here. So from now we are going to put all the info related with the issue you reported to @williamBlazing here ;)
Couple of questions:
cc @rommelDB @Christian8491 @williamBlazing @felipeblazing
@aucahuasi , please find the answers inline.
cc @rommelDB @Christian8491 @williamBlazing @felipeblazing
@jeanp413 +1
Hi @mlahir1 , could you please checkout to thechecking-inner-join
branch and compile it. After that run your initial script with your "big" query and show the output message. Something like ---------BEFORE -- INNER JOIN shoul be printed and other messages possibly. Thanks!
Describe the bug Sometimes running the queries with JOIN we see crashes like:
Steps/Code to reproduce bug
Expected behavior Execution without crash
Environment overview (please complete the following information)
Environment details Please run and paste the output of the
print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context