IBM / node-odbc

ODBC bindings for node
MIT License
146 stars 77 forks source link

[QUESTION] Return JSON only using DB2 for i `JSON_OBJECT` #321

Closed cole-maxwell1 closed 1 year ago

cole-maxwell1 commented 1 year ago

System

Description Hi, I would like to directly extract formed JSON right out of DB2 for i using the JSON_OBJECT scalar function. Is it possible to prevent appending the column name to each object received back from the database and ensure they are parsed as JSON?

Expected behavior Expected JSON object

{
  "employeeNumber": "000010",
  "firstName": "CHRISTINE",
  "middleInitial": "I",
  "lastName": "HAAS",
  "workDepartment": "A00",
  "phone": "3978",
  "hireData": "1965-01-01",
  "job": "PRES    ",
  "yearsOfEducation": 18,
  "sex": "F",
  "birthdate": "1933-08-24",
  "payinfo": {
    "salary": 52750,
    "bonus": 1000,
    "commission": 4220
  }
}

When running query

SELECT JSON_OBJECT(
        'employeeNumber' VALUE EMPNO,
        'firstName' VALUE FIRSTNME,
        'middleInitial' VALUE MIDINIT,
        'lastName' VALUE LASTNAME,
        'workDepartment' VALUE WORKDEPT,
        'phone' VALUE PHONENO,
        'hireData' VALUE HIREDATE,
        'job' VALUE JOB,
        'yearsOfEducation' VALUE EDLEVEL,
        'sex' VALUE SEX, 
        'birthdate' VALUE BIRTHDATE, 
        'payInfo' VALUE JSON_OBJECT(
            'salary' VALUE SALARY, 
            'bonus' VALUE BONUS, 
            'commission' VALUE COMM
        )
    )
    FROM sample.employee
    LIMIT 1;

Current result

[
  {
    "00001": "{\"employeeNumber\":\"000010\",\"firstName\":\"CHRISTINE\",\"middleInitial\":\"I\",\"lastName\":\"HAAS\",\"workDepartment\":\"A00\",\"phone\":\"3978\",\"hireData\":\"1965-01-01\",\"job\":\"PRES    \",\"yearsOfEducation\":18,\"sex\":\"F\",\"birthdate\":\"1933-08-24\",\"payInfo\":{\"salary\":52750.00,\"bonus\":1000.00,\"commission\":4220.00}}"
  }
]

The "problems" are two fold here. A default column name 00001 is appended by DB2 for i and all " are escaped.

To Reproduce

  1. Connection String via a connection pool:
    db2ConnectionPool = await odbc.pool(connectionParameters)
    SYSTEM=<host>;UID=<user>;Password=<password>;Naming=0;DBQ=<libs>
  2. Create database table (IBM i built-in procedure):
    CALL QSYS.CREATE_SQL_SAMPLE ('SAMPLE')
  3. Run SQL listed above via node-odbc

Current work around My current workaround is to SELECT the JSON object with the column name COL then parse each object and extract the desired objects into another array.

public static async getTestJson(
  ) {
    const result = await db2ConnectionPool.query(
      `SELECT JSON_OBJECT(
        KEY 'employeeNumber' VALUE TRIM(EMPNO),
        'firstName' VALUE FIRSTNME,
        'middleInitial' VALUE MIDINIT,
        'lastName' VALUE LASTNAME,
        'workDepartment' VALUE WORKDEPT,
        'phone' VALUE PHONENO,
        'hireData' VALUE HIREDATE,
        'job' VALUE JOB,
        'yearsOfEducation' VALUE EDLEVEL,
        'sex' VALUE SEX, 
        'birthdate' VALUE BIRTHDATE, 
        'payInfo' VALUE JSON_OBJECT(
            'salary' VALUE SALARY, 
            'bonus' VALUE BONUS, 
            'commission' VALUE COMM
            )
        ) AS COL
    FROM sample.employee
    LIMIT 1`
    );

   const employees: Array<Object> = result.map((i: any) => {
    return JSON.parse(i.COL);
   });

   return employees;
  }

Thanks for having a look at this! It feels like there should be a better or more efficient way to do this without having to generate another array, but maybe this is more of a question for the IBM i Access ODBC Driver.

kadler commented 1 year ago

Is it possible to prevent appending the column name to each object received back from the database

No. If you had multiple columns, how would you know which is which?

and ensure they are parsed as JSON?

There's no JSON type in ODBC. The column type of a JSON_OBJECT is likely a CLOB, which will get converted to a string, which is why all the quotes are escaped. Using JSON.parse will convert those strings to objects for you, as you already have done. There's currently no mechanism for the user to specify the type they want a column to be returned as, node-odbc merely converts from the database type to the most appropriate Node type.

Of course, if this was a common enough problem then maybe it makes sense to support.

I suppose one way to do it would be to provide a list (or map) of callbacks for each column to transform it before being returned, eg.

const result = await db2ConnectionPool.query(sql, { 'COL': JSON.parse });

(Though maybe it would be better to take an option object instead of positional arguments to improve extendability)

But even if something like this would be implemented, it likely wouldn't be any more efficient than what you already have.

cole-maxwell1 commented 1 year ago

@kadler Thanks for the response and for indulging my curiosity! I want to see if there was a possible performance gain I was missing out on by creating the JSON object at the database level and then having it ready to go on the node side.

kadler commented 1 year ago

I don't think there would be any performance benefit here. The database will always have to serialize the JSON object somehow, in this case it's to a string. The code in node-odbc that fetches the data cannot de-serialize it, since it does not have access to the Node.js runtime (since holding access to the runtime would block the Node event queue while we're fetching, which would massively slow things down), but instead fetches in to temporary C-style memory buffers. Once completed, this is handed off to a Node task which has access to the Node runtime and converts all the raw data fields in to Node.js objects: String, Number, etc.

It's at this second stage you could do the JSON.parse call and potentially reduce memory, but this is all C++ code, so it would be quite complicated to make it work. You'd have to somehow get access to the JSON.parse function from C++ code and call it, plus you'd need to tell the C++ code which columns to call it on, etc. Unless you're talking about gigabytes of data you're working with, I seriously doubt it would be worth the effort over just using the code you already have.

markdirish commented 1 year ago

The only thing that seems weird to me is your output of the JSON string here:

[
  {
    "00001": "{\"employeeNumber\":\"000010\",\"firstName\":\"CHRISTINE\",\"middleInitial\":\"I\",\"lastName\":\"HAAS\",\"workDepartment\":\"A00\",\"phone\":\"3978\",\"hireData\":\"1965-01-01\",\"job\":\"PRES    \",\"yearsOfEducation\":18,\"sex\":\"F\",\"birthdate\":\"1933-08-24\",\"payInfo\":{\"salary\":52750.00,\"bonus\":1000.00,\"commission\":4220.00}}"
  }
]

I suspect that the backslashes aren't really there, and are instead an artifact of printing out the result. An easy way to check would be to return a single field, and then get the length of the string and see if the length includes the \s or not. Also, do you have some setting that defaults to double quotes around strings? Or maybe I have one defaulting to single quotes? Because all of my quotes seem to be single quotes:

let array = [];
array.push({ "00001": 'This "string" contains "quotes."'});
console.log(array);
[ { '00001': 'This "string" contains "quotes."' } ]

Other than that, I mirror everything @kadler has said. There is no built-in way in Node API to parse JSON. I think there is a way to import functions, so I could do it after all of the data has been returned and as it is being converted from C/C++ data into JavaScript data, but I doubt there would be any performance gain (it would still be calling the same JSON.parse function). The only performance gain would be to roll my own JSON parser and that is way outside the scope of this package.

cole-maxwell1 commented 1 year ago

Also, do you have some setting that defaults to double quotes around strings?

I believe it is something with the logging library.

Thank you both for your insight! Especially the bits about access to the JSON.parse function from C++