GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
395 stars 112 forks source link

Random() function giving an error with LUW DB2 #1239

Open manojredward opened 2 weeks ago

manojredward commented 2 weeks ago

Hi Team, We are comparing the data between LUW DB2 and BQ and our validation are in YAML files. While executing the ROW validation, we gave use_random_rows = True and random_row_batch_size = '1000' and it fails to execute the random rows selection.

Points for your reference

  1. We are using Cloud Run Job to execute the DVT
  2. Our input validation files are in YAML format and stored in GCS bucket
  3. Our connection json is also in GCS and we are overwriting PSO_DV_CONN_HOME environment variable and use it to execute cloud run job

Error: 08/22/2024 05:13:45 PM-ERROR: Error (ibm_db_dbi.ProgrammingError) ibm_db_dbi::ProgrammingError: Statement Execute Failed: [IBM][CLI Driver][DB2] SQL0440N No authorized routine named "RANDOM" of type "" having compatible arguments was found. SQLSTATE=42884 SQLCODE=-440

Input YAML file: result_handler: project_id: table_id: type: BigQuery source: db2 target: bq validations:

helensilva14 commented 2 weeks ago

Hi @manojredward! What is the version of your DB2 instance? The error is indicating it could not found a RANDOM function to use without parameters being given (https://www.ibm.com/docs/en/db2/9.7?topic=messages-sql0250-sql0499#sql0440n).

I checked in our DVT test instance (LINUXX8664 11.5.8.0) and it works this way by using a default seed number, according to the documentation (https://www.ibm.com/docs/en/db2/11.5?topic=module-random-function).

Screenshot 2024-08-23 at 17 15 32

But as we searched from version 11.1 and before, the function requires a seed number as parameter:

If that's your case we indeed need to make a code change, thanks for reaching out!

manojredward commented 2 weeks ago

Thanks @helensilva14 for your response. I will check the DB2 version and get back to you. Meanwhile, Could you please guide me how to pass the seed value in YAML file. Right now, we are passing something like mentioned below,

random_row_batch_size: '1000' type: Row use_random_rows: true

helensilva14 commented 2 weeks ago

As of now, it is not possible to pass a seed number either using DVT CLI or YAML config file.

We'll work on an initial version of a code change to pass a seed direct in our code and ask your help to test it soon!

manojredward commented 2 weeks ago

Hello @helensilva14 We are using the below DB2 version, could you please check this? AIX v11.5.9.0 Mainframe V12R1M505

helensilva14 commented 2 weeks ago

Hi @manojredward! Can you please try to execute the following SQL queries in your instance and let us know if they run without errors? Below you can see the output in our DB2 instance (LINUXX8664 11.5.8.0):

image

manojredward commented 2 weeks ago

Thanks @helensilva14, I will check and let you know asap. We are using AIX v11.5.9.0 DB2, hope this is fine. If we need to install any packages, please let us know.

manojredward commented 2 weeks ago

Hi @helensilva14, We tried to run the below queries in our AIX DB2 and got an error as SQL Error [42884]: NO AUTHORIZED RANDOM NAMED HAVING COMPATIBLE ARGUMENTS WAS FOUND. SQLCODE=-440, SQLSTATE=42884, DRIVER=4.33.31

Attached image for your reference

random_sysdummy random_2_sysdummy
helensilva14 commented 2 weeks ago

Hi @manojredward, thanks for sharing the error message and the images!

We did more research and might be the case that your instance/flavor only recognizes the function as RAND(), such as in this example: https://db2-sql-cookbook.org/#_getting_n_random_rows

Could you please try to execute the following query versions and let us know if they work?

manojredward commented 1 week ago

Hi @helensilva14 We have tried with rand() command and it is working as expected in our DB2. Please find the screenshot below.

image (10) image (11)
helensilva14 commented 1 week ago

Great to hear it! We updated our code to always use RAND() since this method is always available in all DB2 versions, which is not the case for RANDOM().

manojredward commented 1 week ago

Great, Thanks, Please let us know once the code is merged, we will recreate image and do our testing.

helensilva14 commented 1 week ago

We merged the fix a few minutes ago, so it is available at the develop branch now. Can you try packaging the latest develop branch and testing? Please follow these steps:

  1. Clone our DVT GitHub repo: git clone https://github.com/GoogleCloudPlatform/professional-services-data-validator.git
  2. Enter in the folder and activate a Python virtual environment
  3. Package the develop the branch with pip install . to package the current directory with the fix.
  4. Test out the validation commands.
  5. Install missing libraries, if needed
manojredward commented 1 week ago

@helensilva14

By following the thread, we were actually cloned the code base from branch: 1239-random-function-error-luw-db2 and created an image and executed our validation and we are getting the below error.

Error (ibm_db_dbi.ProgrammingError) ibm_db_dbi::ProgrammingError: Binding Error: [IBM][CLI Driver][DB2] SQL0440N No authorized routine named "SETSEED" of type "" having compatible arguments was found. SQLSTATE=42884 SQLCODE=-440

We have added the below changes to our yaml file, random_row_batch_size: '2' use_random_rows: true

Could you please take a look on seed function and let us know the updates.

helensilva14 commented 1 week ago

Hi @manojredward, thank you very much for your testing! I didn't face this error in our instance.

Anyways, I added a new line to our code to always set a default seed value before using the RAND() function, you can check the new pull request here -> #1249

Could you please repeat the steps and test again the branch 1239-random-function-error-luw-db2?

manojredward commented 1 week ago

Thanks @helensilva14 for making those changes, I will recreate the image with this branch and test.

manojredward commented 1 week ago

Hi @helensilva14, We have tested the code from issue1239-random-function-error-luw-db2 branch with two yaml's and both are give me the same error.

YAML #1: random_row_batch_size: '2' type: Row use_random_rows: true

YAML #2: random_row_batch_size: null type: Row use_random_rows: true

Error: 09/03/2024 12:48:33 PM-ERROR: Error (ibm_db_dbi.ProgrammingError) ibm_db_dbi::ProgrammingError: Statement Execute Failed: [IBM][CLI Driver][DB2] SQL0583N The use of routine or expression "RAND()" is invalid because it is not deterministic or has an external action. SQLSTATE=42845 SQLCODE=-583

helensilva14 commented 1 week ago

Hello @manojredward, thanks for reporting this new error!

I tried a new approach of setting the seed value for the RAND() function itself and not externally.

Could you please repeat the steps and test again the branch issue1239-random-function-error-luw-db2?

manojredward commented 1 week ago

Hi @helensilva14, Please find the below error when we test with the updated code from issue1239-random-function-error-luw-db2.

Error: 09/05/2024 06:49:34 AM-ERROR: Error (ibm_db_dbi.ProgrammingError) ibm_db_dbi::ProgrammingError: Binding Error: [IBM][CLI Driver][DB2] SQL0418N The statement was not processed because the statement contains an invalid use of one of the following: an untyped parameter marker, the DEFAULT keyword, or a null value. SQLSTATE=42610 SQLCODE=-418

YAML: random_row_batch_size: '2' type: Row use_random_rows: true

manojredward commented 2 days ago

Hello @helensilva14 and @luispavaogoogle Just want to check, do you have any updates for me?