Closed mgesteiro closed 1 year ago
Hi! Thanks for taking the time to look into this and posting the issue. Much appreciated. I'll take a look and push an update once I think it's working. The example file you provided will be useful.
You're wellcome 👌
This is the fixed token array: it only had 126 elements because the "<>" was missing (and an unnecessary last empty byte)
public static String [] lowTokenArray = {
"AFTER", "AUTO", "BORDER", "CALL", "CAT", "CHAIN", "CLEAR", "CLG", "CLOSEIN", "CLOSEOUT",
"CLS", "CONT", "DATA", "DEF", "DEFINT", "DEFREAL", "DEFSTR", "DEG", "DELETE", "DIM", "DRAW",
"DRAWR", "EDIT", "ELSE", "END", "ENT", "ENV", "ERASE", "ERROR", "EVERY", "FOR", "GOSUB", "GOTO",
"IF", "INK", "INPUT", "KEY", "LET", "LINE", "LIST", "LOAD", "LOCATE", "MEMORY", "MERGE", "MID$",
"MODE", "MOVE", "MOVER", "NEXT", "NEW", "ON", "ON BREAK", "ON ERROR GOTO", "SQ", "OPENIN", "OPENOUT",
"ORIGIN", "OUT", "PAPER", "PEN", "PLOT", "PLOTR", "POKE", "PRINT", "'", "RAD", "RANDOMIZE", "READ",
"RELEASE", "REM", "RENUM", "RESOTRE", "RESUME", "RETURN", "RUN", "SAVE", "SOUND", "SPEED", "STOP",
"SYMBOL", "TAG", "TAGOFF", "TROFF", "TRON", "WAIT", "WEND", "WHILE", "WIDTH", "WINDOW", "WRITE",
"ZONE", "DI", "EI", "FILL", "GRAPHICS", "MASK", "FRAME", "CURSOR", "", "ERL", "FN", "SPC",
"STEP", "SWAP", "", "", "TAB", "THEN", "TO", "USING", ">", "=", ">=", "<", "<>", "<=",
"+", "-", "*", "/", "^", "\\", "AND", "MOD", "OR", "XOR", "NOT", ""
};
I'll leave to you the other required fixes 😅
If it helps, I've been using this for comparison/checking: https://sourceforge.net/p/javacpc/svn/HEAD/tree/JavaCPC_Desktop/src/jemu/system/cpc/Baslist.java
Best regards
I've incorporated the updated token array, and also added suppression of the colon : before an ELSE.
The 0-byte characters appear to have been an incorrect decoding of variable names and string literals (particularly detecting the end of a string literal).
Here's a dump of the decoded file, hopefully that's a bit better:
00000000 31 30 20 52 45 4D 20 43 49 46 52 41 44 4F 20 43 45 53 41 52 0A 32 30 20 43 4C 53 0A 33 30 20 52 10 REM CIFRADO CESAR.20 CLS.30 R
00000020 24 3D 22 22 0A 34 30 20 49 4E 50 55 54 20 22 54 45 58 54 4F 3A 20 22 2C 54 24 0A 35 30 20 54 24 $="".40 INPUT "TEXTO: ",T$.50 T$
00000040 3D 55 50 50 45 52 24 28 54 24 29 0A 36 30 20 46 4F 52 20 69 3D 31 20 54 4F 20 4C 45 4E 28 54 24 =UPPER$(T$).60 FOR i=1 TO LEN(T$
00000060 29 0A 37 30 20 43 3D 41 53 43 28 4D 49 44 24 28 54 24 2C 69 2C 31 29 29 0A 38 30 20 49 46 20 43 ).70 C=ASC(MID$(T$,i,1)).80 IF C
00000080 3E 36 34 20 41 4E 44 20 43 3C 39 31 20 54 48 45 4E 20 4E 3D 28 28 28 43 2D 36 35 29 2B 31 33 29 >64 AND C<91 THEN N=(((C-65)+13)
000000A0 4D 4F 44 20 32 36 29 2B 36 35 20 45 4C 53 45 20 4E 3D 43 0A 39 30 20 52 24 3D 52 24 2B 43 48 52 MOD 26)+65 ELSE N=C.90 R$=R$+CHR
000000C0 24 28 4E 29 0A 31 30 30 20 4E 45 58 54 20 69 0A 31 31 30 20 50 52 49 4E 54 20 52 24 0A 31 32 30 $(N).100 NEXT i.110 PRINT R$.120
000000E0 20 47 4F 54 4F 20 33 30 0A GOTO 30.
Making the changes has exposed that my Amstrad BASIC decoding is quite messy, split between Java classes in a way that isn't particularly helpful. So, I'll need to do a bit of refactoring before it's ready to push back again, and test with some more BASIC programs (I have a tape!). Will aim to do this at the weekend :-)
An updated version is currently pushed into a branch. It's still not 100% complete, but seems to be producing much better results. Changes are:
Next:
Thanks for the effort: I'll be monitoring the progression 👌
This is the only guide I could find about Amstrad CPC detokenisation: https://www.cpcwiki.eu/index.php/Technical_information_about_Locomotive_BASIC#Structure_of_a_BASIC_program
Which one(s) are you using?
Best regards
Hi @eightbitjim,
I tried today the new branch (47_erroneous_amstrad_cpc_decoding_candidate) and I get a lot of garbage on the output and a completely invalid result, much worse than before: am I missing something here?
10 REM CIFRADO CESAR
20 CLS
30 $
8943 "
40 INPUT "TEXTO: ",$
4608
3
7423 ($
41 10
8350
4079 TO LEN($
41 &4600
49920 =ASC(MID$($
3372
3884 ))
80 IF
6638 @ AND
6641 [ THEN
10479 ((
6645 A)+13)MOD 26)+65 ELSE
3567
5888
3
3
1023 (
41
8368
2816
8383 $
2560
8352 30
My test code, just in case:
byte[] data = HexFormat.of().parseHex("14000A00C5204349465241444F20434553415200060014008A000C001E00030000D2EF22220015002800A32022544558544F3A20222C030000D40012003200030000D4EFFF1C28030000D4290018003C009E200D0000E9EF0F20EC20FF0E28030000D429001C0046000D0000C3EFFF0128AC28030000D42C0D0000E92C0F29290044005000A1200D0000C3EE194020FA200D0000C3F1195B20EB200D0000CEEF2828280D0000C3F5194129F4190D29FB20191A29F41941200197200D0000CEEF0D0000C30017005A00030000D2EF030000D2F4FF03280D0000CE29000B006400B0200D0000E9000B006E00BF20030000D2000A007800A0201E1E00");
AmstradBasic ambasic = new AmstradBasic(data, "default");
System.out.println(ambasic);
Best regards
Hi @mgesteiro
Oh no, that's much worse than before! I reckon I had your caesar cipher program displaying apparently correctly based on the earlier data, but I was feeding the bytes into the code in a different way to your example (thanks for sharing the test code). Leave it with me, I'll take another look.
I can't remember exactly what I used as my BASIC reference source originally (it was several years ago), although according to my notes I was using these two books:
The Amstrad CPC-464 Advanced User Guide, by Mark Harrison Amstrad CPCs Advanced Users Guide Book 1, by Daniel Martin
The wiki link is useful, I'll look at that too. I never had an Amstrad, so am only recently starting to learn about them! :-)
Just. quick question if that's ok, about the sequence of bytes you've provided as input. Did you get this from this latest version of the program or the earlier one, or from another source?
One of the changes that I made was that I was (erroneously) including in the output data stream the checksum bytes that periodically occur. So this was causing corruption when they hit bas BASIC detokeniser as it isn't expecting to see these details. I've removed the checksum bytes now at the earlier stage where it parses the file block format (although I still need to implement verification against the checksums). When I tested it, I started again from WAV files and got decent results.
I'll step through tomorrow and see exactly what it's doing with this data. If you have the original audio files then it might be worth feeding them through again to re-generate the binary data without the checksums, but no need as I'll manually remove them from the data and see if that improves the decoding.
Hope that makes sense, and thanks for your help on this!
hi,
about the sequence of bytes you've provided as input. Did you get this from this latest version of the program or the earlier one, or from another source?
The hex data I provided was directly from an audio tape that I recorded (in real hardware), processed and extracted myself: it's the pure and clean data payload of the 2nd part of the block (1st part = header, 2nd part = data), without any previous block type identifier or posterior checksum and trailer. It's just the basic program, clean.
I can provide the wav file if you want, but your tools fail to process it correctly: I planned to ask you about this also... later 😅 (and because I have that part already covered myself...). I just needed a detokeniser and yours seemed a good option (although being in JAVA)
Regards!
Ah, I understand, thanks for info. I'll look at exactly where it's going wrong with the detokenisation of your file, although it might not be until the weekend I can start to make further updates.
Great to hear that you've written the earlier decoding bits. Regards :-)
Hi,
I'm currently not able to reproduce the behaviour you're seeing. I tried cloning a new branch from the repo (branch origin/47_erroneous_amstrad_cpc_decoding_candidate
) and then added a test file alongside the Basic decoding classes in Platforms/Amstrad/Format
. I pasted in your example code.
I needed to move to JDK 17, as the code is quite old and compiled for JDK8, which doesn't seem to have HexFormat. But once on JDK 17, this is what I get when I run the test class:
% java -classpath ./target/classes com.eightbitjim.cassettenibbler.Platforms.Amstrad.Formats.TestFormat
10 REM CIFRADO CESAR
20 CLS
30 R$=""
40 INPUT "TEXTO: ",T$
50 T$=UPPER$(T$)
60 FOR i=1 TO LEN(T$)
70 C=ASC(MID$(T$,i,1))
80 IF C>64 AND C<91 THEN N=(((C-65)+13)MOD 26)+65 ELSE N=C
90 R$=R$+CHR$(N)
100 NEXT i
110 PRINT R$
120 GOTO 30
Here's the test class:
package com.eightbitjim.cassettenibbler.Platforms.Amstrad.Formats;
import java.util.HexFormat;
public class TestFormat {
public static final void main(String args[]) {
byte [] data = HexFormat.of().parseHex("14000A00C5204349465241444F20434553415200060014008A000C001E00030000D2EF22220015002800A32022544558544F3A20222C030000D40012003200030000D4EFFF1C28030000D4290018003C009E200D0000E9EF0F20EC20FF0E28030000D429001C0046000D0000C3EFFF0128AC28030000D42C0D0000E92C0F29290044005000A1200D0000C3EE194020FA200D0000C3F1195B20EB200D0000CEEF2828280D0000C3F5194129F4190D29FB20191A29F41941200197200D0000CEEF0D0000C30017005A00030000D2EF030000D2F4FF03280D0000CE29000B006400B0200D0000E9000B006E00BF20030000D2000A007800A0201E1E00");
AmstradBasic ambasic = new AmstradBasic(data, "default");
System.out.println(ambasic);
}
}
So I'm not sure why we're seeing something different. I'll think further about it. I'm using OpenJDK 17.
Regards!
Hi @eightbitjim,
Sorry, I messed up: my IDE (Eclipse), was giving me some warnings about The value of the local variable xxx is not used and I corrected them (without paying enough attention) before even running the code once... I tried again without touching anything and it worked.
After a more calm revision I discovered that those variables were populated by calling some methods that read from the stream: when I commented out the lines I changed by accident the logic because although the return value (and the variables) were NOT used, the methods needed to be called for that consumption to take place... 🤦♂️
Here is the diff for my revised version of the class in question in case you want to check the modifications:
--- /cassette-nibbler-47_erroneous_amstrad_cpc_decoding_candidate/src/main/java/com/eightbitjim/cassettenibbler/Platforms/Amstrad/Formats/Line.java
+++ /NibblerByteToBasic/src/Line.java
@@ -25,7 +25,7 @@
private InputStream inputStream;
private StringBuilder builder;
private boolean endOfLine = false;
- private boolean insideRemStatement = false;
+ //private boolean insideRemStatement = false;
private boolean insideQuote = false;
private static final int END_OF_LINE_MARKER = 0;
@@ -95,7 +95,7 @@
// Don't output a colon until we see the next token, as we don't want
// to display colons in front of an ELSE
// But this will terminate a REM statement
- insideRemStatement = false;
+ //insideRemStatement = false;
return;
case 2:
case 3:
@@ -157,7 +157,7 @@
processApostrophe();
return;
case 197: // REM statement
- insideRemStatement = true;
+ //insideRemStatement = true;
break;
case 245:
processMinus();
@@ -234,13 +234,15 @@
}
private void processRSXCommand() throws IOException {
- int valueOffsetToIgnore = getByteValue();
+ //int valueOffsetToIgnore =
+ getByteValue();
String commandName = getStringFromInput();
builder.append("|").append(commandName);
}
private void processVariableWithTokenNoSuffix() throws IOException {
- int offsetToIgnore = getTwoByteValue();
+ //int offsetToIgnore =
+ getTwoByteValue();
String variableName = getStringFromInput();
builder.append(variableName);
}
Thanks for your patience!
Hi,
No problem, I'm really glad it's working now! I'll add some comments to explain what's going on (and probably just get rid of the unused variables), and also have a look at the insideRemStatement variable, as I must have put it there for a reason but somehow didn't use it.
Once I've gone through a bit more test data, I'll merge it with the main branch.
Thanks for all the effort!
Hello,
First of all, thanks for the project and all the effort: it's been very valuable.
The BASIC programs conversion for the Amstrad CPC platform is not working properly: a program created in the CPC is incorrectly detokenised.
At the beginning it just seemed like a small issue in the token array table and I was about to send a PR, but then I discovered also -at least- a couple more issues:
Here is the BASIC program in case you want to check it:
"14000A00C5204349465241444F20434553415200060014008A000C001E00030000D2EF22220015002800A32022544558544F3A20222C030000D40012003200030000D4EFFF1C28030000D4290018003C009E200D0000E9EF0F20EC20FF0E28030000D429001C0046000D0000C3EFFF0128AC28030000D42C0D0000E92C0F29290044005000A1200D0000C3EE194020FA200D0000C3F1195B20EB200D0000CEEF2828280D0000C3F5194129F4190D29FB20191A29F41941200197200D0000CEEF0D0000C30017005A00030000D2EF030000D2F4FF03280D0000CE29000B006400B0200D0000E9000B006E00BF20030000D2000A007800A0201E1E00"
BTW, I checked this against the output produced by the JavaCPC similar feature, and it works fine there.
Let me know if I can be of any help.
Best regards!