SCoA Printers - Githubissues

ra1nst0rm3d commented 4 years ago

When are you going to implement support for SCoA printers?

agalakhov commented 4 years ago

I need a volunteer who has a SCoA printer and can program for that.

ra1nst0rm3d commented 4 years ago

I has LBP1120 and may code on C/C++

mounaiban commented 3 years ago

Just leaving this link just in case it would be useful to anyone else who also wants to have a go at implementing SCoA: https://github.com/caxapyk/capt_lbp810-1120

It's a fork of the original Nicolas Boichat driver for the LBP-810, which contains an SCoA implementation.

ra1nst0rm3d commented 3 years ago

No, as say @agalakhov its don't realizing SCoA compression

mounaiban commented 3 years ago

No, as say @agalakhov its don't realizing SCoA compression

Oh, I didn't know that Nicolas' driver wasn't complete :disappointed:

For those coming from Google or the future: see also #20

agalakhov commented 3 years ago

You're wrong. It DOES implement SCoA compression. But it does not handle control commands correctly. It just sends "magic sequences of bytes" without knowing anything about their meaning, which is not always correct. To get an idea:

    {
        unsigned char buf[] = {
            0x00, 0x00, 0xa4, 0x01, 0x02, 0x01, 0x00, 0x00, 0x1f, 0x1f, 
            0x1f, 0x1f, 0x00, 0x11, 0x03, 0x01, 0x01, 0x01, 0x02, 0x00, 
            0x00, 0x00, 0x70, 0x00, 0x78, 0x00, 0x50, 0x02, 0x7a, 0x1a, 
            0x60, 0x13, 0x67, 0x1b};
        write_command_packet_buf(0xa0, 0xd0, 0, 0, (unsigned char*)&buf, 34);
    }

Now we know that this configures paper dimensions, toner saving mode and paper thickness. Nicolas did not know that and just hardcoded the values he had on his configuration. They're wrong for some users.

The actual documentation of SCoA compression is in SPECS file and seems to be complete. The driver contains a working SCoA compression example (but please don't just copy and paste this code, it requires huge refactoring).

ra1nst0rm3d commented 3 years ago

Very useful info, thx

mounaiban commented 2 years ago

:point_right: EDIT: the SCoA specs now have a page in my wiki, please check it out for more info.

@agalakhov I have been trying to understand SCoA for the past week, and I think I am beginning to get it. From what I figure:

SCoA is a Run-Length Encoding (RLE) scheme
It uses bit-sized opcodes and parameters with a format like opcode param_a param_b operand_a [operand_b]
Its 'instruction set' is as follows, according to Nicolas Boichat's SPECS, and the conversation in #20:

Command	Opcode (bits)	Param A	Param B	Operand A	Operand B	Illustration
Copy	`0b00`	bytes to copy (uint3_t)	`0`	uncompressed bytes	--	`00aaa000` :a:
Repeat	`0b01`	reps (uint3_t)	`0`	single repeated byte	--	`01aaa000` :a:
Repeat and Copy	`0b11`	reps (uint3_t)	bytes to copy (uint3_t)	single repeated byte	uncompressed bytes to copy	`11aaabbb` :a: :b:
Enter long command	`0b101`	length/reps (higher 5 bits)	--	Long command	--	`101aaaaa` :a:
Long Copy	`0b11`[^opdesign] (only as operand of 'Enter long command')	bytes to copy (lower 3 bits)	`0`	uncompressed bytes to copy	--	`11aaa000` :a:
Long Repeat and Short Copy	`0b00` (only as operand of 'Enter long command')	reps (lower 3 bits)	bytes to copy (uint3_t)	single repeated byte	uncompressed bytes to copy	`00aaabbb`[^lrsc] :a: :b:
~Repeat last line or sub-buffer~ End of Line, dump buffer to output	`0x41`	--	--	--	--	`01000001`

[^opdesign]: This bothers me a little, I think Long copy should have been 0b00 to be in line with short copy

[^lrsc]: SPECS only mentions Long repeat with short copy where bbb is 001. Other values have not been tested.

Don't judge me, uint3_t and uint5_t is the best C-esque name I can think of! :sweat_smile:

Is my understanding correct, or am I way off?

ra1nst0rm3d commented 2 years ago

@mounaiban Relying on Nicolas' code, looks like you on right way) U will be very useful) Thanks for your research)

agalakhov commented 2 years ago

It looks more or less correct. SCoA commands are not very consistent.

There is a "simple" possibility to check how it works. There is a binary in the stock closed-source Linux driver called "captfilter". Calling it with correct parameters generates a print job with some (not all) A0 commands and with correct compressed pages. You can then try to decompress the output based on your guesses and check if it is decompressed correctly. Look at https://github.com/agalakhov/anticapt/blob/master/filter.sh

ra1nst0rm3d commented 2 years ago

@mounaiban I have some question for you)

What do you mean under length in Copy cmd?
Illustration of it for 0d7 will look like 00111000 00000111 ?

mounaiban commented 2 years ago

@ra1nst0rm3d In the Copy command, length means the number of bytes in the input, after the opcode, to copy to the output. Sorry for the unclear wording; I'm trying to figure out clear names for these things. For your second question: are you trying to copy 7 bytes after the opcode in the output?

@agalakhov Thanks for the tip on the captfilter command. Now if only Canon had left behind some man pages on how to use that thing... :man_shrugging: Well, that doesn't stop me from going strings /usr/bin/captfilter though :smiling_imp:

ra1nst0rm3d commented 2 years ago

No, I understood you, thx.

ra1nst0rm3d commented 2 years ago

@mounaiban Found some interesting commands from disassembled code of captfilter.


fcn.0804ec34("NOP_Command\n");
fcn.0804ec3c(var_1ch, 0x40);

fcn.0804ec34("EOP_Command\n");
fcn.0804ec3c(var_1ch, 0x42);

fcn.0804ec34("RepeatThenRaw_Command\n");
fcn.0804ec3c(var_1ch, (uVar8 & 7) << 3 | 0xc0 | uVar10 & 7);
fcn.0804ec3c(var_1ch, (uint32_t)(uint8_t)var_440h);
fcn.0804ec64(var_1ch, (int32_t)&var_440h + 1, uVar10);

fcn.0804ec34("RepeatX_Command\n");
fcn.0804ec3c(var_1ch, (uVar8 & 7) * 8 | 0xc0);

fcn.0804ec34("CopyLong_Command\n");
fcn.0804ec3c(var_1ch, iVar4 >> 3 & 0x1fU | 0x80);

fcn.0804ec34("CopyShort_Command\n");
fcn.0804ec3c(var_1ch, uVar8 & 7 | 0xc0);

fcn.0804ec34("EOL_Command\n");
fcn.0804ec3c(var_1ch, 0x41);

fcn.0804ec34("CopyThenRawLong_Command\n");
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (arg_10h & 7) << 3 | 0xc0 | in_EDX & 7);
fcn.0804ec64(*(int32_t *)(in_EAX + 0x42c), in_EAX + 9, arg_10h);

fcn.0804ec34("RepeatThenRawLong_Command\n");
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (in_EDX & 7) << 3 | 0x40 | arg_8h & 7U);
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (uint32_t)*(uint8_t *)(in_EAX + 8));
fcn.0804ec64(*(int32_t *)(in_EAX + 0x42c), in_EAX + 9, var_10h);

fcn.0804ec34("CopyThenRepeatLong_Command\n");
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (uVar1 & 7) << 3 | 0x80 | in_EDX & 7);
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (uint32_t)*(uint8_t *)(in_EAX + 8));

fcn.0804ec34("RepeatXLong_Command\n");
fcn.0804ec3c(*(int32_t *)(param_1 + 0x42c), (uVar1 & 7) << 3);

fcn.0804ec34("Extend_Comman\n");
fcn.0804ec3c(*(int32_t *)(param_1 + 0x42c), param_2 & 0x1f | 0xa0);

fcn.0804ec34("CopyThenRaw_Command\n");
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (arg_8h & 7U) << 3 | in_EDX & 7);
fcn.0804ec64(*(int32_t *)(in_EAX + 0x42c), in_EAX + 9, arg_8h);

fcn.0804ec34("CopyThenRepeat_Command\n");
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (arg_8h & 7U) << 3 | 0x40 | in_EDX & 7);
fcn.0804ec3c(*(int32_t *)(in_EAX + 0x42c), (uint32_t)*(uint8_t *)(in_EAX + 8));

There can be some Hi-SCoA commands.

mounaiban commented 2 years ago

I'll just document this here before writing up a wiki page:

Generating SCoA/CAPT print data using `captfilter`

To generate a ready-to-send SCoA-compressed raster from a Portable Grey Map (PGM) file run:

captfilter --CNTblHalftone=0 --CNTblModel=0 input.pgm > out.scoa.capt

The --CNTblModel=0 is the part that selects SCoA compression.

Format: captfilter $OPTIONS $INPUT_FILE > $OUTPUT_FILE

All captfilter output is redirected to standard output, so you need to use the > to redirect the output to a file, or your terminal will be flooded with binary gunk. :ocean:

Input formats accepted by `captfilter`

When you type captfilter --help you get a message that says something like

Usage: captfilter [switches] [pgm file name]

This implies that all Portable Gray Map images. I have only tested PGM P5 images, YMMV with P2 images. PBM P4 images seem to be accepted too, but I have not been able to confirm that P4s are handled correctly.

Lack of error messages

captfilter doesn't seem to be able to inform you when any switch/option is used incorrectly. Instead, empty files are written as output, or captfilter freezes until you press Enter.

Invalid options, including valid ones spelled with a different case (e.g. CNTblModel misspelled as CnTblModel), are ignored.

mounaiban commented 2 years ago

@agalakhov Just a quick question: do you flip bytes in the SCoA bitstream? Or is it just the command and the packet size that's flipped to little endian?

For example, in the output of captfilter if you see: a0 c0 06 01 20 53 0f 00 ff

Do you read it as Command: 0xC0A0 Packet Size: 0x0106 Bitstream: 20 53 0f 00 ff => 00100000 01010011 00001111 00000000 (aka copy 53 0f 00 ff to output) ?

agalakhov commented 2 years ago

The stream is little-endian. My code tries to be machine-endianess-agnostic. The stream is always read byte-by-byte. Then, if we need an integer value of more than one byte, it is assembled in little-endian byte order.

That is, if we have

0x01 0x02 0x03 0x04 0x05 0x06

and have to interpret it as

byte, int16, byte, byte, byte

we get

0x01 0x0302 0x04 0x05 0x06

on both little-endian and big-endian machines. Hope this answered your question.

mounaiban commented 2 years ago

We may have discovered a few more things about the SCoA format, particularly a few new opcodes and the usage of the line buffer.

Long Repeat only Command

101aaaaa 10aaa000 :a:

This command appears to repeat operand A into the line buffer, without any uncompressed suffix. It might have the same effect as Long Repeat and Short Copy 101aaaaa 00aaabbb :a: :b: where bbb == 0b000.

Line Buffer Usage

Boichat's SPECS document doesn't contain a detailed explanation of how the line buffer works, but it mentions 0x41 as End of Line (don't change the buffer). This is also confirmed in @ra1nst0rm3d's disassembly (see previous comments).

Here's how I think the SCoA line buffer works:

There is a single buffer of N bytes, where N is declared in byte 26 & 27 in the very last 0xD0A0 command.
Opcodes can only manipulate the buffer directly.
The buffer is not set to an initial value at the start of a page, just allocated. An empty line must be manually zeroed with a series of Long Repeat 0x00's. Just three opcodes can cover a line on A4 :smiley:
A 0x41 command dumps the line buffer into the output, thus issuing multiple 0x41's in a row repeats a line.
You don't have to describe the whole line, just end it early with a 0x41 if nothing has changed past the current pixel column in any previous lines :smiley:
You can seek to a position in the buffer with special opcodes.

Here are some things I don't know:

a. What is printed when you do a 0x41 before zeroing out the buffer?

b. What happens if one fails to 0x41 before inserting too many bytes into the line buffer?

c. Does the buffer pointer reset to zero after a 0x41?

Line Buffer Pointer Opcodes and Delta Encoding

There are a few extra opcodes that cause the pointer to skip towards the right, but I don't yet know the exact format of the opcodes.

As far as I can tell, there are two "skip right and place raw byte" opcode (one short one long) and one "skip right only"

These opcodes are used to implement some kind of delta encoding when a line is similar to the previous one; instead of encoding the entire line, you can seek to where the line changed and patch.

mounaiban commented 2 years ago

@agalakhov Thanks, I understand it now that there's nothing to flip in the bitstream/payload.

ra1nst0rm3d commented 2 years ago

@mounaiban About three opcodes for A4 line buffer. It's looks like this?

bf b8 00
bf b8 00
aa b0 00

mounaiban commented 2 years ago

@ra1nst0rm3d Well done, you have just drawn a complete blank line across an A4 sheet! :100:

Also: bf b8 ff bf b8 ff aa b0 ff 41 41 41 41 41 41 41 (and so on for 7016 times)

mounaiban commented 2 years ago

I have confirmed the existence of the buffer pointer seek/skip opcode. There are three forms:

Short seek: skip right up to 7 bytes, place a single byte A

0x0b00001aaa :a:

Long seek: skip up to 255 bytes, place a single byte A

0b100aaaaa 0b00001aaa :a:

Extra long seek: skip 256 or more bytes, place a single byte A

Add 0x9F one or more times before 0b100aaaaa 0b00001aaa :a: or 0x0b0001aaa :a:

Examples: 0x9F 0x9F 0b10010101 0b0001111 :a: 0x9F 0b0001000 :a:

Each 0x9F adds ~255~ 248 bytes to the skip. (see later comments)

Master Samples

The use of the seek opcodes can be clearly observed in test pages that look like these:

half-diagonal-preview Half diagonal filled

quarter-diagonal-preview Half horizontal with quarter diagonal filled

My master samples were A4 x 600dpi, which works out to around 4958 x 7016 px. Please be aware that captfilter crops things as it sees fit.

Compressed Raster Preparation

I compressed the rasters to CAPT format with the following command on an Ubuntu 14.04 system with the Canon driver installed:

captfilter --CNTblModel=0 --Resolution=600 --PageSize=A4 --MediaType=PlainPaper $INPUT_FILE > $OUTPUT_FILE

PROTIP: switches and arguments are case-sensitive

Substitute $INPUT_FILE for the raw PBM P4 image, $OUTPUT_FILE with the name of the output file (taking care not to overwrite anything precious!)

Other Stuff

The exact behaviour of the seek opcode is not yet fully figured out. The following questions remain:

Does the seek put the pointer on or after the last skipped byte?
Which opcodes are allowed after a seek? Must it be a single byte, or can a repeat or copy opcode begin right afterwards?

I am starting to like how the 0x41 opcode coincides with the ASCII letter 'A', because it makes it very easy to see on the text view in hex dumps.

ra1nst0rm3d commented 2 years ago

@mounaiban Can you rebuild your opcodes table with new information? This will help me to write compressor.

P.S Hmm... Can we just build first line, then push it to printer, get next line and check differences between them?

P.P.S Can we just init buffer every single band and push data to initialized buffer?

P.P.P.S This looks strange...

static void write_simple_byte(struct state *state)
{
    unsigned i = 1;

    while(state->input_buf[state->input_pos] != state->input_buf[state->input_pos + i] && i < 7) {
        i++;
    }

    push_byte(state, (i << 3), "simple_byte");

    for(; state->input_pos < state->input_pos + i; state->input_pos++) {
        push_byte(state, state->input_buf[state->input_pos], "simple_byte");
    }
}

mounaiban commented 2 years ago

@ra1nst0rm3d I have compiled all known SCoA opcodes into a single document in my wiki, have fun!

I think you are meant to work on it like they do with video compression: start with a "key" line that encodes the whole line, then make "delta" lines that take the previous line and change it.

Also try compressing a sample that looks like this, using captfilter:

circle-preview

The sample makes use of all the opcodes we know so far...

P.S. I will look at the code later, I'm only getting started with this thing :sweat_smile:

mounaiban commented 2 years ago

This is an urgent update about the 0x9f opcode, especially for @ra1nst0rm3d:

0x9f adds 248 bytes to the skip, not the other number as previously thought.

This incidentally is an answer to the contradiction that arises under the initial understanding of 0x9f. Recall the long seek opcode 0b100aaaaa 0b00001aaa skipping 0baaaaaaaa bytes.

If 0x9f meant adding 256 bytes, 0x9f 0x09 (0b10011111 0b00001001) would mean both "skip 256+1 bytes" and "skip 248+1 bytes". :exploding_head:

Sorry if you got stuck, as this misunderstanding could be why you got stuck. I will update the wiki and my previous significant comments soon.

Notes about the function names found in the captfilter disassembly

Also, I think I am beginning to understand the names found in the disassembly for captfilter earlier in this thread.

Canon uses the words 'Raw', 'Repeat' and 'Copy'. While 'Repeat' is obvious, 'Raw' could mean 'uncompressed bytes' and 'Copy' could mean 'copy from the line buffer', contrary to the terminology I have been using so far.

To avoid potentially contradicting the original terms, I'll just use repeat for compressed bytes, and new for uncompressed bytes not yet written out, old for bytes in the buffer.

From my understanding of Hi-SCoA, I think there's just one big adaptive uncompress command. LZ77 (which Hi-SCoA is a specialisation of) attempts to compress everything; repeated segments don't have to be contiguous in LZ77, so there is no need for separate uncompressed and compressed segments and their separate commands.

ra1nst0rm3d commented 2 years ago

@mounaiban captdriver works on FreeBSD with CUPS: printer responds on commands sent to him. Nicolas' driver don't work, because bug in kernel.

ra1nst0rm3d commented 2 years ago

@mounaiban Can you generate PPD file for LBP1120? I think, I has some errors in this thing.

mounaiban commented 2 years ago

@ra1nst0rm3d Thanks for testing the driver on FreeBSD. It's really great to know we can run on BSDs :+1: Which FreeBSD version, CUPS/libusb (or equivalent) version and architecture did you run it on? I'm going to update my wiki to document this.

As for the PPD, what errors are you getting? The PPD from your fork compiles on my test Ubuntu (14.04.6) system just fine. I used the command ppdc -vd . src/canon-lbp.drv

mounaiban commented 2 years ago

:warning: The table below may be out of date by the time you read this, please check out the SCoA Specifications on the captdriver wiki for the latest version

In the meantime I think I might have discovered more opcodes (or rather, new ways to use the ones we already know). Here's what I have observed so far:

This is just a provisional list, we will find out for sure only once the SCoA decoder is done.

Opcode	My Name	(Alleged) Canon [^pun] Name	Comment
`0b01BBBAAA` `X`	old + repeat	`CopyThenRepeat`	A (1-7) bytes from previous line then B (1-7) repeated bytes. :new:
`0b00BBBAAA` `X0..Xn`	old + new	`CopyThenRaw`	A (1-7) bytes from previous line then B (1-7) uncompressed bytes. :new:
`0x9f`	extend_old_Long	`Extend`	Add 248 to the old byte count for old_Long+new and old_Long+repeat commands
`0b100AAAAA` `0b00BBBAAA` `X0..Xn`	old_Long + new	`CopyThenRawLong`	A (8-255) bytes from previous line then B (1-7) uncompressed bytes. Re-interpretation of recently known long seek command :new:
`0b100AAAAA` ~`0b10BBBAAA`~ `0b01BBBAAA` `X`	old_Long + repeat	`CopyThenRepeatLong`	A (8-255) bytes from previous line then B (1-7) repeated bytes :new:
`0b01AAA000` `X`	repeat	`RepeatX`	A (1-7) repeated bytes. Could be "zero old + repeat" bytes.
`0b11AAABBB` `X` `Y0..Yn`	repeat + new	`RepeatThenRaw`	A (1-7) repeated bytes, then B (1-7) uncompressed bytes [^rnnote]
`0b101AAAAA` `0b10AAA000` `X`	repeat_Long	`RepeatXLong`	A (8-255) repeated bytes only
`0b101AAAAA` `0b00AAABBB` `X` `Y0..Yn`	repeat_Long + new	`RepeatThenLong`	A (8-255) repeated bytes, then B (1-7) uncompressed bytes
`0b101AAAAA` `0b11AAA000` `X0..Xn`	new_Long	`CopyLong` [^nlnote] :question:	A (8-255) uncompressed bytes only
`0x40`	NOP	`NOP`	Dummy non-op. Not seen during tests but found in disassembly of captfilter
`0x41`	EOL	`EOL`	End of line. Dump line buffer to output, return line buffer pointer to zero.
`0x42`	EOP	`EOP`	End of page/picture. Don't decompress anything past this point.

Watch out for the A's and B's... in some opcodes B comes first!

Now that we have an interpretation for 13/14 opcodes, it's time to test the living daylights out of this thing... we still don't have a place for CopyShort though.

[^pun]: pun intended :nerd_face: [^rnnote]: Nicolas' original SPECS file mentions a value of 0b001 for B only. [^nlnote]: this would contradict Canon's definition of "Copy" so far, here it means "Copy from uncompressed part of input".

ra1nst0rm3d commented 2 years ago

@mounaiban I think I had some errors in dimensions. Need to compare with blobs' PPD.

About FreeBSD: Successfully launched on FreeBSD 13.1-RC5, CUPS 2.4.1, libusb as part of base system, X86_64. I can provide instruction to install driver on FreeBSD.

2022-05-09-083654_1920x1080_scrot

ra1nst0rm3d commented 2 years ago

@mounaiban We need CopyShort ASAP)

mounaiban commented 2 years ago

@ra1nst0rm3d I still can't find the CopyShort opcode in my tests. For now, maybe try 0b01000AAA (old+repeat, but with zero old bytes)?

The test sample I am using to discover more Copy opcodes can be generated from sample-blots.py on my studycapt repo. If you cloned studycapt, you should have a reasonably recent copy of it if you git pull'ed in the last two weeks.

Use the mirrored-incr-runs pattern, which you can generate from this command:

./sample_blots.py --format=p4 --resolution=600 --size=a4 --mode=mirrored-incr-runs --comment='SCoA opcode discovery sample incr-runs' --out_file=$YOUR_FILE_PATH

If you copy-paste the command, you will get a 600dpi test page in PBM P4 format.

Please be patient if the sample generation takes a long time due to doing it pixel-by-pixel, on single-threaded CPython :snake::snail:. Run captfilter on the test page as usual to get the encoded version.

Explanation of how the Mirrored Incrementing Runs sample works

On every line there is a run of set pixels followed by an almost equal run of unset pixels. The width of the runs decrease by a little bit for each line until the middle of the page, where there are a few solid lines, before increasing again towards the bottom of the page.

The gradual increase in the width of the runs is to generate opcodes with gradually-increasing values, thus revealing the bit layout of the values, and the operating range of the opcodes. The mirroring is to avoid cropping at the top.

mirrored-incr-runs-thumbnail Totally not ripped off Space Harrier

:nerd_face:

mounaiban commented 2 years ago

@ra1nst0rm3d Is that a Hackintosh? If Apple actually sold Mac Pro's with Ryzen processors, I would have bought one.

Are the commands exactly the same on FreeBSD, or are they slightly different?

ra1nst0rm3d commented 2 years ago

@mounaiban Hmm, looks interesting. Will try at home

About Hackintosh: it's bspwm with polybar tuned with look-and-feel of MacOS. You can see config on my "dotfiles" repo.

About sample_plot: try this

ra1nst0rm3d commented 2 years ago

@mounaiban Your new table approves my write_simple_byte() and try_write_byterepeat() implementation.

mounaiban commented 2 years ago

@ra1nst0rm3d I saw your commits over the past week, and it looks like it has a working SCoA and CAPT 1 implementation. Did you manage to print?

And what the hell is Numba? It's possessed! :snake: :dash:
Tried it and it made my code 3x faster, and that's just using @jit. I wonder how fast I can go with the parallelisation and an Nvidia GPU/processor card. Thanks for the tip! :+1:

ra1nst0rm3d commented 2 years ago

@mounaiban I tried, but it won't work at all) This is CUPS test page ))0)

mounaiban commented 2 years ago

@ra1nst0rm3d is the upper left corner of the picture the upper right of the page?

Maybe the printer is not getting the correct line width or image size? That's my wild guess seeing how the pixels are staying on one side of the page and going off the bottom while staying the same width throughout...

My other guess is that the printer is not getting the correct code for long runs, but I think this is less likely :thinking:

I am also questioning my understanding of old_Long + new and old_Long + repeat; I suspect that

old_Long+new could be 0b100PPPPP 0b00PPPQQQ X0..Xn 0bPPPPPPPP bytes from last line, then QQQ uncompressed bytes
old_Long+repeat could be 0b100PPPPP 0b10PPPQQQ X 0bPPPPPPPP bytes from last line, then QQQ repeated bytes

contrary to the table above. Try the above interpretations if the current table doesn't seem right...

mounaiban commented 2 years ago

I have just added a SCoA decoder to the studycapt repo, but it's not quite working correctly yet. I'm trying to figure out if there was an error in the data extraction from the captfilter job files or if our understanding of the algorithm is still not 100% correct.

This is an attempt to decompress the circle sample page:

Screenshot from 2022-05-19 23-23-02

Note that the circle has been squashed, and the glitched lines are remarkably consistent. The black region at the bottom indicates missing pixels. Were long runs cut short? I don't know yet, but I may have made my first NFT :money_mouth_face:

ra1nst0rm3d commented 2 years ago

@mounaiban Thanks for your SCoA decompressor) This will give a huge impact to development)

mounaiban commented 2 years ago

@ra1nst0rm3d You're welcome :smile: Remember to pull the latest fix, there's at least one typo that is messing up the decompression.

scoa.SCoADecoder.decode() git:ef9c0d output

The latest fix as of mounaiban/studycapt@ef9c0d17e9a754eb43d95e13243df8f94dd1858e improves the accuracy of the decompression, but we're still not 100% (which we need, because lossless)

There is one opcode that's eluding me: in the A4-sized circle sample from sample_blots.py, the SCoA-compressed version of the sixth line of the circle reads: 9f 83 a4 9e ff 08 fe 41. The two bytes that I suspect to contain pixel data are ff and fe. On the circle sample, bytes containing both set and unset pixels can only occur up to twice per line, and set and cleared pixels are contiguous.

If that is the case, everything I thought I knew about old_Long or 0b100PPPPP 0b10QQQPPP could be wrong.

ra1nst0rm3d commented 2 years ago

Circle not full black? If it full black, pixel data will be 'ff'.

mounaiban commented 2 years ago

This is how we unpack 9f 83 a4 9e ff 08 fe 41:

Update: what I said earlier, I take that back. Check this post history for the original version.

9f => 0b10011111 (old_Long, add 248 to count) 83 => 0b10000011 (old_Long, add 0b00011 << 3 (24)) a4 => 0b10100100 (old_Long (add 0b100 (4) and repeat 0b100 bytes (4)) Dump a total of 276 bytes from the previous line, repeat 4 bytes 9e => 0b10011110 (mystery opcode :male_detective:, some kind of modifier?) ff => (byte to be repeated) It looks like 0x9f has to be interpreted as a separate opcode. I would call it a form of old_Long. All of the drama above presumably fill the inside of the circle.

08 => 0b00001000 (put just one byte) fe => (byte to be placed) This would patch the right edge of the circle

41 => end of line

Our mystery of the week (hopefully we can solve it in a week) is: what does 0x9e do, and when does Canon use it?

mounaiban commented 2 years ago

@ra1nst0rm3d The circle is fully filled black. The test page is generated by sample_blots.py:

./sample_blots.py --mode circle --size a4 --resolution 600 --format p4 --out_file circle-a4.pbm :black_circle:

The sample is then compressed with: captfilter --CNTblModel=0 --Resolution=600 --PageSize=A4 circle-a4.pbm > circle-a4.capt

Currently at a loss trying to figure out these mystery opcodes that captfilter shoves, when it feels like it, between the opcode and the operands, like in the example above.

Because my decompressor doesn't handle them, they cause the wrong bytes to be repeated or passed, and the remaining data gets misinterpreted as opcodes.

The mystery opcodes aren't always 0x9f, so we have to figure out another way to detect them. Wish me luck! :angel:

I hope these mystery codes aren't needed to make the printer usable...

ra1nst0rm3d commented 2 years ago

Sorry, but I'm suspending development of captdriver to ~middle of Jule, because I'm giving final exams on school and I'm going to university this autumn :) Maybe, I will push some my decisions to my fork, but it will be a bit unstable.

P.S Good luck, @mounaiban. I think, that you discover all opcodes and I write final version of SCoA compressor

mounaiban commented 2 years ago

@ra1nst0rm3d All the best for your exams! :vulcan_salute:

mounaiban commented 2 years ago

I believe we have found another opcode, and it's the longest one so far :giraffe: : 0b100UUUUU 0b101YYYYY 0b10ZZZWWW P and it means 0bUUUUUWWW old bytes from the previous line followed by 0bYYYYYZZZ repeats of byte P.

I'm inclined to call it old_Long + repeat_Long in the meantime. It's Canon name might be CopyThenRepeatLong, presumably with an 8-bit argument value.

This also solves this week's Mystery Opcode: 9f 83 a4 9e ff 9f => 248 old bytes from the previous line 83 a4 9e => 0b10000011 0b10100100 0b10011110, 0b00011110 (30) more old bytes + 0b00100011 (35) repeated bytes ff => byte to be repeated

This has been verified with the original uncompressed A4 600dpi circle sample. The sixth line of the circle is 287px long, and from our mystery opcode, 35 * 8 == 280. There are seven more pixels from the following opcode which places a single fe byte, bringing up the count to a matching 287.

SCoA Decoder WIP

UPDATE: We have successfully decompressed test samples. The SCoA decoder now works correctly on test samples at time of writing. Further work on the decompressor is expected to be mostly validation! validation! validation!:clap: and will continue as far as long as this driver is relevant.

scoa-decoder-circle-success This is a decompressed image, I swear! The black bar at the bottom are stand-ins for missing pixels due an incorrect height declaration in the output P4 bitmap.

ra1nst0rm3d commented 2 years ago

I passed my first Unified State Exam in the russian language. It's been very easy! Anyway, good job @mounaiban. I'm passing next two exams within June and I'm getting back to work!

mounaiban commented 2 years ago

@ra1nst0rm3d Well done, sounds like you were well prepared for your exams! :v:

There's yet another opcode, and I'd call it old_Long + new_Long. It makes perfect sense to have an uncompressed counterpart to old_Long + repeat_Long, I guess :dancers:

The instruction bit layout is like: 0b100UUUUU 0b101YYYYY 0b11ZZZWWW P0..Pn and it means 0bUUUUUWWW old bytes from the previous line followed by P0..Pn uncompressed and of length 0bYYYYYZZZ. The first two bits of the third byte is 11 instead.

I will post the updates when I get the chance...

ra1nst0rm3d commented 2 years ago

Can we assume that second bit of third byte switch between new and repeat?

agalakhov / captdriver

SCoA Printers #33

Generating SCoA/CAPT print data using `captfilter`

Input formats accepted by `captfilter`

Lack of error messages

Long Repeat only Command

Line Buffer Usage

Line Buffer Pointer Opcodes and Delta Encoding

Short seek: skip right up to 7 bytes, place a single byte A

Long seek: skip up to 255 bytes, place a single byte A

Extra long seek: skip 256 or more bytes, place a single byte A

Master Samples

Compressed Raster Preparation

Other Stuff

Notes about the function names found in the captfilter disassembly

Explanation of how the Mirrored Incrementing Runs sample works

SCoA Decoder WIP

agalakhov / captdriver

SCoA Printers #33

Generating SCoA/CAPT print data using captfilter

Input formats accepted by captfilter

Lack of error messages

Long Repeat only Command

Line Buffer Usage

Line Buffer Pointer Opcodes and Delta Encoding

Short seek: skip right up to 7 bytes, place a single byte A

Long seek: skip up to 255 bytes, place a single byte A

Extra long seek: skip 256 or more bytes, place a single byte A

Master Samples

Compressed Raster Preparation

Other Stuff

Notes about the function names found in the captfilter disassembly

Explanation of how the Mirrored Incrementing Runs sample works

SCoA Decoder WIP

Generating SCoA/CAPT print data using `captfilter`

Input formats accepted by `captfilter`