Open harendradh opened 3 months ago
Could you add
.option("debug", "true")
and send the HEX value of the field that is incorrectly decoded, and I'll take a look
Sorry for so many questions, but we have been trying since long.
Yes, Cobrix supports packed decimal data. 'debug' does not suppose to change anything, it just creates debug columns. I'm asking you to to send an example of HEX values that Cobrix didn't convert properly.
For instance, 0x12345C
is 12345
as a packed decimal. You are saying that Cobrix didn't convert field values properly. But I need an example, which exact numbers were not properly converted.
Transaction code is coming as 601, while in mainframe I see the value as 791. The field type is PIC S9(3)V COMP-3.
Another is date field, which is PIC S9(7)V COMP-3, it is coming as null in dataframe, while it should come as 240802 actually in mainframe.
Cobrix version installed is spark_cobol_2_12_2_7_3_bundle.jar. We have spark version 2.12 in Databricks.
When debug=true
you should see columns with '_debug' suffix. How do they look like for the above fields?
E.g.
field1 = 601, field1_debug=? field2 = 240802, field2_debug=? ?
Got it thanks. Transaction code = 601, debug = 601C Date : 2020020220 , debug = 2020020220
Makes sense. Yes, 601C
= 601
is supported. 2020020220
= 2020020220
is not supported.
What are the definitions for these fields in the copybook?
I think we can add support for 2020020220
, just need to understand more about type, size, and layout.
This are specs for COMP-3
that describe how these numbers are currently parsed in Cobrix: http://www.3480-3590-data-conversion.com/article-packed-fields.html
Transaction code is defined as PIC S9(3)V COMP-3. If I see this value in mainframe is 791, while in dataframe coming as 601. This is also coming incorrect :-( .
Thanks for the quick revert back. Date is defined as below. 'XXX'-TRAN-DT is the field we are printing 02 'XXX'-TRAN-DATE-TIME. 03 'XXX'-TRAN-DT PIC S9(9)V COMP-3. 03 'XXX'-TRAN-TM PIC S9(7)V COMP-3.
Thanks for the field definition. We can add support for COMP-3 numbers without a sign nibble.
Just keep in mind that this definition:
03 'XXX'-TRAN-DT PIC S9(9)V COMP-3.
implies 9 digits. while 2020020220
has 10 digits - e.g. is in breach of the field definition.
Transaction code is defined as PIC S9(3)V COMP-3. If I see this value in mainframe is 791, while in dataframe coming as 601. This is also coming incorrect :-( .
:(
Please, send the value of the _debug
field that you are claiming to be 791
Checked - parsing of 0x2020020220
is already supported. Use COMP-3U
for fields that might not contain the mandatory sign nibble, like 'XXX'-TRAN-DT
Transaction code is defined as PIC S9(3)V COMP-3. If I see this value in mainframe is 791, while in dataframe coming as 601. This is also coming incorrect :-( .
:( Please, send the value of the
_debug
field that you are claiming to be791
When i tried this with keeping the field in copybook unchanged i.e. PIC S9(3)V COMP-3, in debug it was coming as 601C.
After changing the data from COMP-3 to COMP-3U, now this field (Transaction code ) comes as null, in debug value is coming as 601C. Date 1 : 20000, Debug date 1 : 204E PIC S9(4)V COMP-9 Date 2 (transaction date): 2020020220 , same data in debug field PIC S9(9)V COMP-3U Transaction Time : null, Debug field : EFBFBD2A PIC S9(7)V COMP-3U
601C
in the _debug
field is a correct COMP-3, so you need to use COMP-3
there. You didn't have to switch to COMP-3U
.2020020220
in the _debug
column is COMP-3U
since it does not contain the sign nibble.EFBFBD2A
in the _debug
column is not correct BCD format, so neither COMP-3
nor COMP-3U
will work.Maybe you can do something like
df.select("failure_field1", "failure_field1_debug").show(false)
and send here the table, for each field that is failing for you.
I will double check the 601 with user, if he is sending me wrong snapshots. I can't upload the table due to data privacy, here are the values. I printed first 10, all are coming as below
Trans_code Acct_open_dt Tran_date Trans_time |601 |601C |20000 |204E |2020020220 |2020020220 |NULL |EFBFBD2A | |601 |601C |20000 |204E |2020020220 |2020020220 |NULL |EFBFBD2A | |601 |601C |20000 |204E |2020020220 |2020020220 |NULL |EFBFBD2A
Tran Date and time are together under a level field called trans-date-time, if it makes any difference.
- The column that contains
601C
in the_debug
field is a correct COMP-3, so you need to useCOMP-3
there. You didn't have to switch toCOMP-3U
.- Date that has
2020020220
in the_debug
column isCOMP-3U
since it does not contain the sign nibble.- Transaction Time that has
EFBFBD2A
in the_debug
column is not correct BCD format, so neitherCOMP-3
norCOMP-3U
will work.Maybe you can do something like
df.select("failure_field1", "failure_field1_debug").show(false)
and send here the table, for each field that is failing for you.
I will double check the 601 with user, if he is sending me wrong snapshots. I can't upload the table due to data privacy, here are the values. I printed first 10, all are coming as below
Trans_code Acct_open_dt Tran_date Trans_time |601 |601C |20000 |204E |2020020220 |2020020220 |NULL |EFBFBD2A | |601 |601C |20000 |204E |2020020220 |2020020220 |NULL |EFBFBD2A | |601 |601C |20000 |204E |2020020220 |2020020220 |NULL |EFBFBD2A
Tran Date and time are together under a level field called trans-date-time, if it makes any difference.
I will double check the 601 with user, if he is sending me wrong snapshots. I can't upload the table due to data privacy, here are the values. I printed first 10, all are coming as below
Trans_code Acct_open_dt Tran_date Trans_time |601 |601C |20000 |204E |2020020220 |2020020220 |NULL |EFBFBD2A | |601 |601C |20000 |204E |2020020220 |2020020220 |NULL |EFBFBD2A | |601 |601C |20000 |204E |2020020220 |2020020220 |NULL |EFBFBD2A
Tran Date and time are together under a level field called trans-date-time, if it makes any difference.
Looks good. The only issue left then is Trans_time
, right?
Looks good. The only issue left then is
Trans_time
, right? Trans_date value looks incorrect i.e. 2020020220 . Similarly Acct open date doesnt look good 20000. Other values I will double check in mainframe.
Other than the wrong values above, 02 'XXX'-TIME-HHMMSS PIC X(06) , this field is taking values from one byte PICX field prior and post to that field.
Value: 00829 , Debug: 203030383239 If I increase the size of the fiels to PIC X(8), the values comes perfectly as 0082918 . I thought PIC X field will be straightforward using Cobrix. Value: 0082918 , Debug: 2030303832393138
Value: 0082918 , Debug: 2030303832393138
I realized for this example that your data is ASCII, not EBCDIC. Ebcdic encoding for 0082918 is F0F0F8F2F9F1F8 I never saw COMP-3 to be used with EBCDIC. Usually, it is converted automatically to DISPLAY.
I thought PIC X field will be straightforward using Cobrix.
It is straightforward. One character is 1 byte.
We are using cobrix to convert the mainframe EBCDIC file. Below are the problematic data fields:
XXX-TRANSACTION-AMOUNT PIC S9(15) V99 COMP-3
We are not able to convert the fields correctly. I suspect due to sign field we are running into issues and coming as NULL. Rest all fields are coming correctly.
cobolDataframe = spark.read.format("za.co.absa.cobrix.spark.cobol.source") \ .option("copybook", "dbfs:/FileStore/Optis Test/copybook.txt") \ .option("record_format", "D") \ .option("is_rdw_big_endian", "true")\ .option("rdw_adjustment", -4) \ .load("dbfs:/FileStore/Optis Test/inputfile.txt")
thanks for the help
Background [Optional]
A clear explanation of the reason for raising the question. This gives us a better understanding of your use cases and how we might accommodate them.
Question
A clear and concise inquiry