Node.js process crash when encounter special character

benli2023 commented 4 years ago

we have existing old file and use RPG code can display correctly ,but when using this connector will cause the entire Node.js crash, after investigation, we found that if the filed contain some special character ,in our case is '\u000f' , the whole application will crash and the Node.js will not give any response to the browser .is it the memcp function cause the overflow and the Node.js crash ? because the length detection no longer work for this case

Lemonleave commented 4 years ago

Following is the details of zhangwenzhuo's error case.

We have create a PF with 2 fields of length 3A and 24O(to store some Chinese narrative), the 3A field is the key field of this file.
A display screen is created to add records into this file. User added one record by iseries display screen. But I am not sure how they input it in the screen. I can see the 24O narrative field's Hex value of this record is '0FC4C4F0F1C4C5C3F2F0F1F8405040E2D940C1D5D5D6E4D5'
Another enquiry screen is built and can display the value of this narrative field. Also I can use STRSQL to select this record.
But when we are using nodejs-idb-connector to select the narrative field of this record, the whole connector crashes. Node.js V10 and nodejs-idb-connector V1.2.0 are used.

Hope above details can help to simulate the case.

Lemonleave commented 4 years ago

We have upgraded the nodejs-idb-connector to latest V1.2.4. It won't crash now, but it will only display a character '-' for the whole narrative field. I believe it is due to the fix on issue #94. But can we have the output value just like what we use with STRSQL?

Lemonleave commented 4 years ago

The field's value I see via STRSQL is ' DD01DEC2018 & SR ANNOUN'

dmabupt commented 4 years ago

Hello @Lemonleave, The Hex value C4C4F0F1C4C5C3F2F0F1F8405040E2D940C1D5D5D6E4D5 is the EBCDIC encoded string of DD01DEC2018 & SR ANNOUN. The issue may be related to the first byte 0F. In both EBCDIC and ASCII, it is shift in or SI. STRSQL displays it as a space. I will investigate how idb-connector handle this character.

Lemonleave commented 4 years ago

@dmabupt,

Exactly, the issue is on the 0F byte. Thanks for the follow up.

dmabupt commented 4 years ago

The error details are --

SQLSTATE=01547
SQLCODE=191
Mixed data or UTF-8 data not properly formed.

idb-connector may handle this specifically.

dmabupt commented 4 years ago

Hello @Lemonleave , Would you try to set the environment variable DB2CCSID to 0? It makes idb-connector run under the IBM i job default CCSID rather than UTF-8(1208).

$ DB2CCSID=0 node basic.js 
[{"COL935":"\u000fDD01DEC2018 & SR ANNOUN      ","COL836":"\u000fDD01DEC2018 & SR ANNOUN      "}]

The JS string keeps the 0F byte and display it as \u000f.

ThePrez commented 4 years ago

A string containing the shift in character is only valid if a shift out character (0x0E) has previously occurred in the same string. If the data starts with 0F, the data is invalid and cannot be properly converted. I'm not sure what, if anything, should be done in the database connector. It is behaving correctly when run in 1208. One could fix the data or handle special cases in SQL if this is a common occurrence in the database.

Lemonleave commented 4 years ago

@dmabupt, late reply due to the holiday season. This may be working if the DB connection job is also the biz logic server job. But in our system, we are using DB connection pool which is separated from the server job. The DB connection jobs are automatically submitted by system under sub-system QWORK. I will see if there is any way to add env parameter to our DB connection jobs.

@ThePrez, our user input the value by copying it from WORD docuement for some case. That is why it contains the shift in character. User not realizes that it contains a shift in/out or not. But I'm wondering how the RPG/SQLRPG/STRSQL handle this case, because they are all displaying an empty space for the single shift-in character. Is it due to the CCSID of my IBM-i login session?

Lemonleave commented 4 years ago

@dmabupt I am able to set the ENV variable, it can return the sample value you provided. But setting CCSID to 0 means the connector won't do the UTF8 decode when the field contains non English characters. In my application, there are many files contain second language fields. So setting the zero CCSID is not working for us.

ThePrez commented 4 years ago

If you are taking non-validated input from the user, there is clearly potential for bad data in the database. It's a classic database problem, but completely understandable. This appears to "work" for green-screen interfaces simply because the data is not converted to a different CCSID.

I suspect the best answer here would involve the use of REGEXP_REPLACE to strip out any data before an SI without an SO present https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_74/db2/rbafzscaregexp_replace.htm

Your search string would be something like: ^[^\x0E]*\x0F (meaning any sequence of characters not containing an SO followed by an SI).

Curious if @forstie could lend extra insight.

forstie commented 4 years ago

The progression of the dialog helps, nice going. When DDS created physicals and Native DB access methods are used, there is the potential of inputting invalid data into the database. When SQL is used, the data must be valid for the data type of the target column, so this situation cannot occur.

Options include: 1) The invalid data could be made valid (as Jesse has noted) 2) The application(s) could change to prevent future/new instances of invalid data 3) An SQL view could be created over the troublesome physical file. The view could eliminate the column or transform the column. Then, the node app would always use the view.

I usually advise fixing problems at their source, but that requires investment, skill, and resolve. If that's not possible in this case, consider option 3, with the awareness that you're trading off performance to gain function.

hth...

benli2023 commented 4 years ago

@forstie @dmabupt @ThePrez we have tried below solution: select using hex(column) to get the hex value for the invalid data after get the hex value,and convert to byte array then invoke Java method to convert the specified ccsid 935, the final string look almost the same except the\u0f became "?" symbol com.ibm.mq.headers.Charsets.convert(byte[] bytes,int ccsid)

can we do the same in the idba driver ? ie borrow the conversion logic from java implementation ?

https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.5.0/com.ibm.mq.javadoc.doc/WMQJavaClasses/com/ibm/mq/headers/Charsets.html

github-actions[bot] commented 4 years ago

:wave: Hi! This issue has been marked stale due to inactivity. If no further activity occurs, it will automatically be closed.

IBM / nodejs-idb-connector

Node.js process crash when encounter special character #97