Closed ddeclerck closed 4 months ago
Interestingly, the indexed BDB file has been produced by an external tool. Analyzing with db_dump
reveals some internal pages containing short data (length 3/4), possibly interpreted as keys by GnuCOBOL (not sure what those are - maybe partial/intermediate keys ? - I don't know about BDB's internals). Feeding this BDB file through GCSORT using a COPY operation generates a new BDB file that does not contain these short datas, but preserving the actual file contents.
Anyways, should GnuCOBOL accept these files ? If so, the fix is rather easy: we should order keys even if they do not have the same length. I tried it and the given program behaves as expected.
(probably should have notified you about this one, @GitMensch )
I've subscribed to all PRs and issue here, so get the starting post by mail :-)
I've subscribed to all PRs and issue here, so get the starting post by mail :-)
Ah, alright, so eyes everywhere 👀 :D
I see, we speak about https://github.com/OCamlPro/gnucobol/blob/39344cf66085620e1dfdb16d54c7ba081efd1da8/libcob/fileio.c#L914-L936
Side note : the error does not trigger termination ; should we add a call to
cob_hard_failure
aftercob_runtime_error
?
Definitely not within the function (a runtime error may also consist of multiple outputs) but it is commonly used the way that after the first we do call the second function.
Anyways, should GnuCOBOL accept these files ?
Depends. in general the ORGANIZATION INDEXED
format has an internal structure, so if it is effort or possibly costly, then we may should either not do that or not by default.
As-is we already have the size check there and would within this case do what cob_cmp_alnum()
does (which we could extract from there into two functions and the "inner one" (that gets 2 times data pointer 2 sizes and one optional col) be made available via coblocal and use in fileio.c.
Depends. in general the
ORGANIZATION INDEXED
format has an internal structure, so if it is effort or possibly costly, then we may should either not do that or not by default. As-is we already have the size check there and would within this case do whatcob_cmp_alnum()
does (which we could extract from there into two functions and the "inner one" (that gets 2 times data pointer 2 sizes and one optional col) be made available via coblocal and use in fileio.c.
I see that cob_cmp_alnum
compares any extra character with spaces, but I don't think bdb_bt_compare
should to that: it is used only when a collating sequence is specified, otherwise BDB defaults to an internal comparison function, which, according to the documentation, behaves as follows:
If no comparison function is specified, the keys are compared lexically, with shorter keys collating before longer keys.
We only use BDB to mimic ISAM and that means that keys must be sorted as-if like in COBOL and there alphanumeric compares behave as-if they would be of same size and space padded. If the file was completely created by libcob then the keys are of identical size so far, so that's no issue, if we add the handling for different size we may also do that if no collation is specified - but that may produce a recognizable overhead in which case we should only do this depending of an additional preprocessor variable...
Then we have two choices:
I guess both are fine.
1 and 2 can actually go together, as long as there's an option to switch. And as this is minimal and quite special, it is enough to have that directly with a preparser flag:
int bdb_bt_compare ()
{
#ifdef USE_BDB_KEYDIFF /* flag may passed with CPPFLAGS */
/* code taking also "no col" and effectively doing the cob_cmp_alnum [either moved out or copy-pasted with a comment)] */
#else
/* current code and additional cob_hard_failure */
#endif
}
Of course for USE_BDB_KEYDIFF
(feel free to use a better name) the compare function needs to be always registered, also with an empty col,
Alright, so no need to add a configure option - just need to document that flag somewhere.
I guess @engboris is up to the task ;)
Hello @GitMensch, I will be in charge of this fix. I have a question to make things clearer for me:
Could you explain your take on cob_hard_failure
? I see that you include it in your last example ("and additional cob_hard_failure
[...]"). In particular, I don't understand the second part of this message:
Definitely not within the function (a runtime error may also consist of multiple outputs) but it is commonly used the way that after the first we do call the second function.
The hard failure should be used next to (not within) the runtime error function.
I guess you'll not only add it (the fix part) but also include the variant that operates on not 100% confirming BDB files (always using a sort function), right?
Please target your PR to the 3.x branch.
Yeah, that's in some code I wrote 😅
This error should not happen, yet it does happen in one of our customer use case. I was provided the program that reads the "offending" indexed file, but I'm still expecting the program that creates it so I can analyze further.
Side note : the error does not trigger termination ; should we add a call to
cob_hard_failure
aftercob_runtime_error
?