Optimize breaking up data for encoding method in Code128 dynamic mode

Badkoubehei commented 3 years ago

In order to optimize the encoded data length, only numeric sequences with length 3 or more must be encoded in mode C. Shorter sequences must be encoded in mode A or B. In addition, if the sequence length is odd, it is more optimize to encode the first number in mode A or B (not the last number. Because encoding the last number will need another mode switching)

barnhill commented 3 years ago

What do you think of this @binki ?

Encoding the first vs the last will require the same amount of switching modes unless Im missing something.

Badkoubehei commented 3 years ago

@barnhill, no, encoding the last number will need one more mode switching (between CODE_A and CODE_C). As an example, please check this input: GAA01BB0880001 Without this PR, it will encode as show in the following picture: But after this PR, it will encode as the following: As you see, it will use two less symbols; The first is to not encode numeric sequences having length less than 3, and the other one is to encode the first number of an odd-length numeric sequence in mode A or B (not the last).

binki commented 3 years ago

@barnhill

What do you think of this @binki ?

Encoding the first vs the last will require the same amount of switching modes unless Im missing something.

I have not yet looked at the implementation, so I am not sure if my concerns have been addressed. However, here are my thoughts! I will try to check back again (if I remember x.x).

It sounds reasonable to try to optimize the number of switches for size. This could actually help a lot with getting the barcode to fit on screen or in a document. The biggest concern I have is that there might be a compatibility concern for an application which accidentally relies on the existing behavior. For example, maybe a scanner which is configured to ignore barcodes which use mode C (not sure if that is a thing). Hopefully all people consuming barcodes work with them as strings and do not know how they are actually encoded ;-).

The last versus first seems to be an optimization for when the 3-number sequence is at the very end of the barcode. If the last three characters of a barcode are a run of digits following nondigits encoded using mode A or B, you could encode that as either “mode-C 01 mode-A 2” or you could encode that as “0 mode-C 12”.

However, if the 3-digit run is in the middle of a barcode, then I think switching codes would cost you more. You could encode “A123B” as “start-A A 1 2 3 B” (6) or “start-A A 1 mode-C 23 mode-A B” (7). Thus, if you switch for a 3-digit sequence which is followed by a switch back to mode A or B, then switching to mode C increases the total width of the barcode unnecessarily. However, if you have a run of 4 digits followed by a mode switch, then you do not increase the width of the barcode by switching modes—but you do not gain anything until you have a run of at least 6 digits:

Surrounded 3 digits “A123B”
- Switching: “start-A A 1 mode-C 23 mode-A B” (7)
- No switching: “start-A A 1 2 3 B” (6)
Surrounded 4 digits “A1234B”
- Switching: “start-A A mode-C 12 34 mode-A B” (7)
- No switching: “start-A A 1 2 3 4 B” (7)
Surrounded 5 digits “A12345B”
- Switching: “start-A A 1 mode-C 23 45 mode-A B” (8)
- No switching: “start-A A 1 2 3 4 5 B” (8)
Surrounded 6 digits “A123456B”
- Switching: “start-A A mode-C 12 34 56 mode-A B” (8)
- No switching: “start-A A 1 2 3 4 5 6 B” (9)

@Badkoubehei Please let me know if I am missing something!

binki commented 3 years ago

@Badkoubehei

As you see, it will use two less symbols; The first is to not encode numeric sequences having length less than 3, and the other one is to encode the first number of an odd-length numeric sequence in mode A or B (not the last).

After thinking about it, I think these rules would make sense and result in the most optimal mixed mode barcodes:

Start or end digit sequence must be at least 4 digits long (e.g., “start-C 12 34 mode-A A” (5) is cheaper than “start-A 1 2 3 4 A” (6)) and aligned to either the start or end.
Inner digit sequences must be at least 6 digits long.

Unfortunately, an earlier commit I made disturbed Code128.cs quite a bit, so this PR has merge conflicts will need some work x.x. If you could merge in master (or maybe it is easier to reimplement from scratch and force push?), I will review the actual code. Sorry for the hassle and thanks for the contribution!

Badkoubehei commented 3 years ago

@binki

* Start or end digit sequence must be at least 4 digits long (e.g., “start-C 12 34 mode-A A” (5) is cheaper than “start-A 1 2 3 4 A” (6)) and aligned to either the start or end.
Yes, you are right about the length of sequence. But if I have got your meaning about "aligned to either the start or end" right, I think aligning to start or end matters for odd-length sequences. See these examples:

12345A Align to start => start-C 12 34 mode-A 5 A (6) Align to end => start-A 1 mode-C 23 45 mode-A A (7)

A12345 Align to start => start-A A mode-C 12 34 mode-A 5 (7) Align to end => start-A A 1 mode-C 23 45 (6)

It shows that if the sequence is at the start of string, aligning to start is more optimal and if sequence is at the end, aligning to end is better.

* Inner digit sequences must be at least 6 digits long.

Yes, you are right. And in this case, aligning to start or end does not matter:

A1234567A Align to start => start-A A mode-C 12 34 56 mode-A 7 A (9) Align to end => start-A A 1 mode-C 23 45 67 mode-A A (9)

Unfortunately, an earlier commit I made disturbed Code128.cs quite a bit, so this PR has merge conflicts will need some work x.x. If you could merge in master (or maybe it is easier to reimplement from scratch and force push?), I will review the actual code. Sorry for the hassle and thanks for the contribution!

I checked your commit. I think re-implementing my commit is easier than resolving the conflicts :). I will try to do it.

binki commented 3 years ago

@Badkoubehei

@binki

Start or end digit sequence must be at least 4 digits long (e.g., “start-C 12 34 mode-A A” (5) is cheaper than “start-A 1 2 3 4 A” (6)) and aligned to either the start or end.

Yes, you are right about the length of sequence. But if I have got your meaning about "aligned to either the start or end" right, I think aligning to start or end matters for odd-length sequences. See these examples:

12345A Align to start => start-C 12 34 mode-A 5 A (6) Align to end => start-A 1 mode-C 23 45 mode-A A (7)

A12345 Align to start => start-A A mode-C 12 34 mode-A 5 (7) Align to end => start-A A 1 mode-C 23 45 (6)

It shows that if the sequence is at the start of string, aligning to start is more optimal and if sequence is at the end, aligning to end is better.

Correct! Aligning to the start or end only matters for odd-length sequences. I just wanted to clearly state that odd-length sequences of 5 or more digits at the beginning of the barcode need to have the barcode start in Mode C rather than switching to it after the first character. It is similar to the optimization at the end of the barcode. I wanted to make sure that this optimization was considered:-).

Inner digit sequences must be at least 6 digits long.

Yes, you are right. And in this case, aligning to start or end does not matter:

Correct.

I checked your commit. I think re-implementing my commit is easier than resolving the conflicts :). I will try to do it.

Thanks!

barnhill commented 3 years ago

This is awesome 😎. I'm pretty stoked about the collaboration here!!

rob313663 commented 3 years ago

Hi,

there are Shift A and Shift B that allows switching between Code A and Code B too. Never used them myself but they are said to only shift for the next code word.

/rob

On Thu, May 27, 2021, 16:52 Brad Barnhill @.***> wrote:

This is awesome 😎. I'm pretty stoked about the collaboration here!!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/barnhill/barcodelib/pull/127#issuecomment-849701308, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQNI4LHBD6YWRKWIJ42YSTTPZMDJANCNFSM43BOBAZA .

Badkoubehei commented 3 years ago

@rob313663

Hi, there are Shift A and Shift B that allows switching between Code A and Code B too. Never used them myself but they are said to only shift for the next code word.

Great! I did not know about that. It may change the algorithm!

1234A23 Start-C 12 34 Code-A A 2 3 (7) Start-C 12 34 Shift-A A 23 (6) Here the ending sequence of numbers has length 2, but using the Shift-A makes it optimal to encode using code-C.

1234A567 Start-C 12 34 Shift-A A 56 Shift-A 7 (8) Start-C 12 34 Code-A A 5 6 7 (8)

1234A5678 Start-C 12 34 Shift-A A 56 78 (7) Start-C 12 34 Code-A A Code-C 56 78 (8)

12A5678 Start-C 12 Shift-A A 56 78 (6) Start-A 1 2 A Code-C 56 78 (7)

I will try to consider it in my implementation. @barnhill what do you think?

rob313663 commented 3 years ago

Hi Ahmad,

The Shift code word only exist in the Code A and Code B code sets. The only time it will save a code word, I think, is when data suitable for Code B surrounds a single control character, like CR (ASCII 13), it would save a single code word, example:

[StartB]Testing[SHIFT][CR]some more.[CRC][STOP]

instead of

[StartB]Testing[CODEA][CR][CODEB]some more.[CRC][STOP]

So it will just be a very small optimization. I think Shift is mostly useless. Control characters in Code 128 is not very common.

24 years ago I wrote an encoder and designed a font for Code 128, and I tested the encoder today, it did not have support for Shift, not even in the raw encoding mode. I guess I did not know about it then.

Another topic:

One very common usage of Code 128 is to use it for marking pallets and cartons according to the GS1-128 content specification. Maybe that would be a good idea for including support for in barcodelib.

I could contribute with my knowledge about GS1-128 if anyone is interested in building some code for it. I also have code in C++ and C# for a GS1-128 parser.

/rob

On Thu, 27 May 2021 at 19:53, Ahmad Badkoubehei @.***> wrote:

@rob313663 https://github.com/rob313663

Hi, there are Shift A and Shift B that allows switching between Code A and Code B too. Never used them myself but they are said to only shift for the next code word. Great! I did not know about that. It may change the algorithm!

1234A23 Start-C 12 34 Code-A A 2 3 (7) Start-C 12 34 Shift-A A 23 (6) Here the ending sequence of numbers has length 2, but using the Shift-A makes it optimal to encode using code-C.

1234A567 Start-C 12 34 Shift-A A 56 Shift-A 7 (8) Start-C 12 34 Code-A A 5 6 7 (8)

1234A5678 Start-C 12 34 Shift-A A 56 78 (7) Start-C 12 34 Code-A A Code-C 56 78 (8)

12A5678 Start-C 12 Shift-A A 56 78 (6) Start-A 1 2 A Code-C 56 78 (7)

I will try to consider it in my implementation. @barnhill https://github.com/barnhill what do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/barnhill/barcodelib/pull/127#issuecomment-849825047, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQNI4IKNYIXRB34VRALJJDTP2BI3ANCNFSM43BOBAZA .

Badkoubehei commented 3 years ago

Hi @rob313663 Thanks for sharing your information. So we can ignore optimization of the shift code.

barnhill commented 3 years ago

I'm interested in supporting GS1-128. It's a subset of C128 as it's formatted data encoded with C128 as far as I know.

fiatCurrency commented 6 months ago

I have what I believe is a much more robust method of correctly decomposing which substrings should be in subset C.

The situation is far more complex than 'must be at least 3'.

The complexity depends on something which appears not to be addressed at all: the FNC1 code. That code can be encoded in sets A,B, or C.

So, a sequence "0000f66" in the middle of a string (where f means func 1) would be better encoded encoded in set C. Then there are odd numbers of digits which terminate at or start at a FNC1.

It isn't obvious how sequence such as X55555f777f99999f44 should be encoded.

barnhill / barcodelib

Optimize breaking up data for encoding method in Code128 dynamic mode #127