Move cconv to libcob - Githubissues

ddeclerck commented 1 year ago

This allows to make anything using libcobol aware of the various EBCDIC encodings. In particular, we need this with the contributed tool GCSORT.

GitMensch commented 1 year ago

That would be possible in general, but why does GCSORT need that? Couldn't it just set the appropriate flags?

In general those tables should not be exported but a pointer to it be resolved by a function, please adjust the PR for that.

ddeclerck commented 1 year ago

That would be possible in general, but why does GCSORT need that? Couldn't it just set the appropriate flags?

GCSORT is a C program that directly use libcob, it does not call cobc hence can not set "appropriate flags".

In general those tables should not be exported but a pointer to it be resolved by a function, please adjust the PR for that.

Indeed, I took a shortcut. The API will be cleaner with a function.

GitMensch commented 1 year ago

GCSORT is a C program that directly use libcob, it does not call cobc hence can not set "appropriate flags".

I've thought that all options were passed as flags, but the collation tables are indeed passed as pointers, so those need to be resolved up-front.

This is also matching the FCD structure with its _colPtr / FCD-COL-SEQ-ADDRESS fields.

So yes: a way to resolve those is needed by a new API function - the biggest question: should this really use a string (like the decipher function) or not, using only the enum then (which would be placed in common.h, otherwise in coblocal.h).

Note: there was one thing left open for the function in general: a way to provide the name of an input file (looked up in COB_CONFIG_DIR / by full name) to freely define the collation (for example as hex numbers. optional space separated with a newline then the ebcdic version); if we want that it seems reasonable to define a function maybe something like int cob_get_collation (char *name /* either pre-defined one or a filename */, unsigned char **ebcdic_col, **ascii_col /* the two later may be NULL in which case there is only a heck for validity */ ). The int may be only 0 for "success" and an errno value as return, or have -1 as error (with ERRNO set) and an internal index of the resolved collation (which could then passed later for int cob_get_indexed_collation (const int col_index /* must be resolved before, may be different in another run-time unit */, unsigned char **ebcdic_col, **ascii_col).

Thoughts?

Side note: Additional there is an only "on/off" flag in confFlags2 / FCD-CONFIG-FLAGS-> fcd--file-is-ebcdic, but I haven't found a documented note about how this would be used.

ddeclerck commented 1 year ago

So yes: a way to resolve those is needed by a new API function - the biggest question: should this really use a string (like the decipher function) or not, using only the enum then (which would be placed in common.h, otherwise in coblocal.h).

Note: there was one thing left open for the function in general: a way to provide the name of an input file (looked up in COB_CONFIG_DIR / by full name) to freely define the collation (for example as hex numbers. optional space separated with a newline then the ebcdic version); if we want that it seems reasonable to define a function maybe something like int cob_get_collation (char *name /* either pre-defined one or a filename */, unsigned char **ebcdic_col, **ascii_col /* the two later may be NULL in which case there is only a heck for validity */ ). The int may be only 0 for "success" and an errno value as return, or have -1 as error (with ERRNO set) and an internal index of the resolved collation (which could then passed later for int cob_get_indexed_collation (const int col_index /* must be resolved before, may be different in another run-time unit */, unsigned char **ebcdic_col, **ascii_col).

Thoughts?

Actually, we could even move the current hard-coded collating sequences to external files. However we have to determine how this should interact with the current mechanisms. The new option introduced by Nicolas (-febcdic-table) is meant to configure the EBCDIC collating sequence only, but one might also want to specify an ASCII collating sequence (for our friends not using latin-1 or latin-9).

GitMensch commented 1 year ago

Actually, we could even move the current hard-coded collating sequences to external files

That would mean to parse those files on each call of cobc - on first use.... maybe only keep the "standard" definition in and tall others as file?

The most important part here is that this would have to be done in the next 24 hours or wait for "quite some time later".... In any case I suggest to add some API call that should be extendable later.

The new option introduced by Nicolas (-febcdic-table) is meant to configure the EBCDIC collating sequence only

Yes and for what it is used for (conversion in both directions) that's fine, non-latin would be something definitely for later.

ddeclerck commented 1 year ago

Actually, we could even move the current hard-coded collating sequences to external files

That would mean to parse those files on each call of cobc - on first use.... maybe only keep the "standard" definition in and tall others as file?

Sounds reasonable.

The most important part here is that this would have to be done in the next 24 hours or wait for "quite some time later"....

Oh, well, I guess this will have to wait then.

In any case I suggest to add some API call that should be extendable later.

That's what I'm aiming for.

GitMensch commented 1 year ago

Just drop a note / request a review when you consider this clean enough to go and I'll check this out, responding quickly.

ddeclerck commented 1 year ago

@GitMensch Not done yet, but feedback welcome. In particular, I'm not sure where to put the collating sequence initialization code (function init_collating_sequences) in the generated program.

ddeclerck commented 1 year ago

@GitMensch I'm trying to better understand the original code and the recent changes made to EBCDIC support ; maybe you can enlighten me on something. In the version of output_param before the recent EBCDIC patches :

why is the flag cb_flag_alt_ebcdic only considered in the case CB_TAG_ALPHABET_NAME and not in the case CB_TAG_REFERENCE / CB_ALPHABET_NAME_P ?
why is the flag cb_flag_alt_ebcdic considered in the case CB_ALPHABET_EBCDIC but not in the case CB_ALPHABET_ASCII ? Shouldn't this be symmetric (i.e using two different pairs of EBCDIC tables depending on whether the flag is set or not) ?

nberth commented 1 year ago

@GitMensch I'm trying to better understand the original code and the recent changes made to EBCDIC support ; maybe you can enlighten me on something. In the version of output_param before the recent EBCDIC patches :

why is the flag cb_flag_alt_ebcdic only considered in the case CB_TAG_ALPHABET_NAME and not in the case CB_TAG_REFERENCE / CB_ALPHABET_NAME_P ?

why is the flag cb_flag_alt_ebcdic considered in the case CB_ALPHABET_EBCDIC but not in the case CB_ALPHABET_ASCII ? Shouldn't this be symmetric (i.e using two different pairs of EBCDIC tables depending on whether the flag is set or not) ?

As far as I recall, the restricted table does not directly correspond to any native EBCDIC machine encoding, so the translation is never needed in one of the directions (to ASCII — that would actually be needed to collate ASCII items on a hypothetical restricted EBCDIC machine).

GitMensch commented 1 year ago

Note: the state on this PR - even with the requested changes - is quite good; and while it is a weekend... @ddeclerck @nberth the chances to get it into 3.2 after this weekend are... not good; so to make that available soon you possibly want to have a look at this.

ddeclerck commented 1 year ago

Note: the state on this PR - even with the requested changes - is quite good; and while it is a weekend... @ddeclerck @nberth the chances to get it into 3.2 after this weekend are... not good; so to make that available soon you possibly want to have a look at this.

I think this should be mostly alright. I did what I could considering my current working conditions (I'm in a train at the moment, it's pretty noisy).

Still unsure about the get_collation_name function : the codegen needs it to write the actual table name corresponding to the table id that is stored in the flag (or I could just have stored the flag as a string).

GitMensch commented 1 year ago

LGTM

ddeclerck commented 1 year ago

Can you please add at least some function doc to the newly externalized cob_get_collation_by_name and to cob_get_collation_name? Ideally we get some doc to gnucobol.texi - but this could be done next week, too-

I added comments in common.h.

You may commit the change to 3.x now. Do you need help with the svn part?

Well, I'm not very used to SVN, but I have an up-to-date copy of the GnuCobol SVN repo. Should I just copy my files here and commit ?

GitMensch commented 1 year ago

Well, I'm not very used to SVN, but I have an up-to-date copy of the GnuCobol SVN repo. Should I just copy my files here and commit ?

cd branches/gnucobol-3.x  # if you checked out everything, otherwise just cd into the right folder
svn update   # never harms, should get you a commit from today
# overwrite the files and `svn delete` the old ones
svn status   # check what you've got in the 3.x branch folder
svn commit

GitMensch commented 1 year ago

Warning! The base version is not the one with the 777 fix - so either svn update -r 4876 (which should get you the matching version), then place the files in, then svn update, or creating a patch for this change and apply it to the current svn head.

ddeclerck commented 1 year ago

Warning! The base version is not the one with the 777 fix - so either svn update -r 4876 (which should get you the matching version), then place the files in, then svn update, or creating a patch for this change and apply it to the current svn head.

Yeah, I was concerned about that. I think I'll just extract a patch from my git tree (seems safer).

GitMensch commented 1 year ago

please try again, the commit only included the testsuite.

ddeclerck commented 1 year ago

That's weird, only two files got transmitted. I'll check what has gone wrong (the poor internet connexion in the train does not help)

ddeclerck commented 1 year ago

Ah, I was just not in the right directory...

Anyways, trying to commit, I now have a message complaining that the type of a file "unexpectedly changed". That file : ~ build_aux/ar-lib svn revert does not help here.

GitMensch commented 1 year ago

~ build_aux/ar-lib svn revert does not help here.

Does it help to rm -rf delete build_aux && svn update?

ddeclerck commented 1 year ago

~ build_aux/ar-lib svn revert does not help here.

Does it help to rm -rf delete build_aux && svn update?

Doesn't seem to.

GitMensch commented 1 year ago

Then go with svn update && svn status Then svn commit cobc/* libcob/* (or the list of files seen in svn status).

ddeclerck commented 1 year ago

I think that should be fine now.

GitMensch commented 1 year ago

now upstream

OCamlPro / gnucobol

Move cconv to libcob #71