IAU-ADES / ADES-Master

ADES implementation based on a master XML file
26 stars 7 forks source link

Prog codes / Optical residuals #49

Closed stevechesley closed 4 months ago

stevechesley commented 5 months ago
stevechesley commented 5 months ago

OK, I made a bit of a mess by inadvertently starting this branch from PR #48 before it had been merged, but I believe it now all sorted out and ready to review and merge.

stevenstetzler commented 5 months ago

@stevechesley it looks like the ordering of programCodesArray matters, but validProgramCodes does not.

validProgramCodes is only used here: https://github.com/IAU-ADES/ADES-Master/blob/c329bdd6ab4401d00ec87d16f969e72994fd89b2/Python/bin/mpc80coltoxml.py#L548

which doesn't rely on any ordering.

programCodesArray is used here: https://github.com/IAU-ADES/ADES-Master/blob/c329bdd6ab4401d00ec87d16f969e72994fd89b2/Python/bin/packUtil.py#L234-L248

which does rely on the position of a character in the codes array for creating packed (base-52) program code IDs. Note that the index of the character position matches the number to character labels at the top of https://www.minorplanetcenter.net/iau/lists/ProgramCodes.txt

And yes, it seems fine to do

validProgramCodes = ' ' + programCodesArray

if they really are only expected to differ by the white space. Given the usage of validProgramCodes in the code, my guess for the reason they differ by one white space is that the note column in an obs80 record can probably be blank.

I noticed that there is a bug as is in the index translation since the programCodesArray is declared as a "raw string" with the r prefix and also escapes a double quote using a backslash (\"). Here is what I mean: The position of $ should translate to 13, but instead it translates to 14 since \ is counted in the raw string:

>>> r"0123456789!\"#$%&'()*+,-./[\]^_`{|}~:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz£".index("$")
14

If a regular string is used and the double quote escaped or the string is created using triple quotes so the escape character is unnecessary, then the position is translated correctly:

>>> "0123456789!\"#$%&'()*+,-./[\]^_`{|}~:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz£".index("$")
13
>>> r"""0123456789!"#$%&'()*+,-./[\]^_`{|}~:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz£""".index("$")
13
>>> """0123456789!"#$%&'()*+,-./[\]^_`{|}~:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz£""".i
ndex("$")
13

I think this was added in with one of my previous commits so it's my mistake. I'll put a suggested fix in my review. And defer to @federicaspoto to interpret the validProgramCodes and programCodesArray difference/meaning.

Bill-Gray commented 5 months ago

I wonder about the pound-sterling (£) at the end of the list of allowable program codes. It would be, I think, the first excursion beyond ASCII for the 80-byte format. It appears on the top line of ProgramCodes.txt, then nowhere else; the table of characters to numbers that follows only goes up to z=93 (there is no £=94).

If it ever appears in a punched-card record, it'll have to be in some encoding other than UTF8 (in which £ consumes three bytes). Some conversion would then be required when going to/from ADES, which is (practically speaking) encoded in UTF8.

Bill-Gray commented 5 months ago

Still wondering about the actual use of £ as a program code. Is there any possibility of it being used? If so, is the actual byte 0xA3 (for compatibility with CP-437 and CP-1252), or something else? (It can't be UTF8; that would require expansion to two bytes.) Can we safely remove all references to it, or is there the possibility of other "high ASCII" bytes being used? Anyone from MPC care to comment?