doranchak / azdecrypt

azdecrypt is a fast and powerful hillclimbing classical cipher solver written in freebasic
92 stars 8 forks source link

AZDecrypt

AZdecrypt is a fast and powerful hillclimbing classical cipher solver written in FreeBASIC.

Latest binaries can be found here:

Additional README

Table of Contents

  1. Solvers
  2. Settings
  3. Build Steps
  4. Stats
  5. Notes, Tips, and Tricks
  6. Release History

Solvers

AZDecrypt's main window has a list of solvers. Each one is a type of hillclimber that specializes in different types of ciphers:

Substitution

Simple substitution ciphers (including homophonic substitution)

Substitution + columnar rearrangement

A combination of simple substitution and moving columns around.

Substitution + columnar transposition

A combination of simple substitution and columnar transposition.

Substitution + crib grid

This solver displays a grid that is the same shape as the input cipher. You can enter your guesses for portions of the plaintext. Then, the solver will search for the rest of the plaintext but maintain the plaintext entries you've made.

Substitution + crib list

Substitution + monoalphabetic groups

Substitution + nulls and skips

Tries to figure out if certain symbols in a substitution cipher don't actually contribute towards the plaintext. A "null" is a position in the cipher text that does not translate to plaintext (and is ignored during decryption). The solver will consider different positions to be nulls, and exclude them prior to the next steps of decryption. A "skip" is a missing symbol that is inserted at some position in the cipher. The solver will put them in different positions of the ciphertext before proceeding to the next steps of decryption. More info

Substitution + polyphones

Substitution + row bound

Substitution + row bound fragments

Substitution + sequential homophones

Substitution + simple transposition

Substitution + sparse polyalphabetism

Substitution + units

Jarl's note on using this solver on nomenclator type ciphers:

Demonstration of partial solve on transposed Z340 plaintext

Can solve a whole variety of transposition ciphers keyed or unkeyed as long as the transposition can be summarized by a limited set of periodic rules. Select "Periodic transposition [auto]" or "Periodic transposition inverted [auto]" if you want the solver to try to automatically determine the transposition. Note that the transposition solver will write additional output to the Output folder (for example, the transposition matrices it finds).

One of the factors that guides this solver is periodic redundancy, a measurement that looks at periodicity in candidate transposition matrices. Matrices that have a lot of periodic behavior will have a lot of repeating periods, which increases periodic redundancy. If there is more randomness to the matrix, periodic redundancy will be low. This measurement is balanced against the n-grams score to try to arrive at the correct transposition matrix. More info here in Jarl's post.

Normal vs inverted transposition matrices

Say we have the following transposition matrix:

1   3   5   7   9   11  13  15  17  19  21  23  25  27  29  31  33
35  37  39  41  43  45  47  49  51  53  55  57  59  61  63  65  67
69  71  73  75  77  79  81  83  85  87  89  91  93  95  97  99  101
103 105 107 109 111 113 115 117 119 118 116 114 112 110 108 106 104
102 100 98  96  94  92  90  88  86  84  82  80  78  76  74  72  70
68  66  64  62  60  58  56  54  52  50  48  46  44  42  40  38  36
34  32  30  28  26  24  22  20  18  16  14  12  10  8   6   4   2

A) Say we put the first letter of the plaintext at position 1, the second letter at pos 2, then pos 3 etc. (transposition)

If we use the methodology explained here we end up with the following period map:

118  -117 116  -115 114  -113 112  -111 110  -109 108  -107 106  -105 104  -103 102
-101 100  -99  98   -97  96   -95  94   -93  92   -91  90   -89  88   -87  86   -85
84   -83  82   -81  80   -79  78   -77  76   -75  74   -73  72   -71  70   -69  68
-67  66   -65  64   -63  62   -61  60   -59  58   -57  56   -55  54   -53  52   -51
50   -49  48   -47  46   -45  44   -43  42   -41  40   -39  38   -37  36   -35  34
-33  32   -31  30   -29  28   -27  26   -25  24   -23  22   -21  20   -19  18   -17
16   -15  14   -13  12   -11  10   -9   8    -7   6    -5   4    -3   2    -1

These are all unique periods so the solver could never get it. Study the matrices and see why.

Now, if we go back to step A, instead of putting, we get the letter from position 1, then we get the letter from pos 2 etc. (untransposition)

It would equal to the following matrix (for putting, transposition)

1   119 2   118 3   117 4   116 5   115 6   114 7   113 8   112 9
111 10  110 11  109 12  108 13  107 14  106 15  105 16  104 17  103
18  102 19  101 20  100 21  99  22  98  23  97  24  96  25  95  26
94  27  93  28  92  29  91  30  90  31  89  32  88  33  87  34  86
35  85  36  84  37  83  38  82  39  81  40  80  41  79  42  78  43
77  44  76  45  75  46  74  47  73  48  72  49  71  50  70  51  69
52  68  53  67  54  66  55  65  56  64  57  63  58  62  59  61  60

And if we extract the period map now we get:

2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
2  2  2  2  2  2  2  2  -1 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2

So now it's periodic again. So by inverting the matrix in the solver we can solve these more difficult cases.

Jarl made a tool for analyzing transposition matrices as of AZdecrypt 1.21: Stats, transposition matrix analysis.

Also, if you go to Format, and then Convert, there are tools to extract period maps and invert matrices, etc.

Put in the first matrix, the one that starts with 1 3 5 etc and click transposition matrix analysis.

Output should be:

AZdecrypt transposition matrix stats for: test.txt
---------------------------------------------------------
Locality (linear distance between numbers): 98.31%, 0.83%
Locality (distance from natural positions): 33.89%, 33.89%

Periodic stepping: redundancy% (inverted, standard)
--------------------------------------------------
1: 84.57%, 0%
2: 83.62%, 85.44%
3: 82.67%, 0%
4: 81.70%, 85.39%
5: 80.71%, 0%
6: 79.71%, 85.33%

It shows that the logarithmic periodic redundancy at stepping 1 inverted is 84.57%, meaning that it should be easy to solve.

More difficult cases require higher stepping levels to solve.

Simple transposition

The same solver as the Substitution + simple transposition but does not perform substitution


Settings

Under "Settings → Solver Settings":

[General] CPU Threads

[General] Thread wait

[General] Entropy weight

[General] Iterations

The number of total iterations performed during the search by the main hill climber (aka, the "inner" hill climber. see "Hill climber iterations" below). Note: When AZdecrypt writes results to the Output directory, it will write intermediate output files after 60 seconds of no output. This is more likely to occur when the solver takes a while to run, such as when iterations are high.

List of solvers that use only the inner hill climber (and thus, "hill climber iterations" setting will have no additional effect):

[General] Iterations factor

[General] Hill climber iterations

Some solvers use two hill climbers: An "inner" hill climber, and an "outer" hill climber. The "iterations" setting is for the inner hill climber (substitution solver etc), and the "hill climber iterations" setting is for a secondary hill-climber on top of the substitution hill-climber. For example, solvers that use a secondary hill-climber have the following on top in the Output window: Restart: 1 Hill climber: 90/5000 @ 500000. The 5000 here are the hill-climber iterations and the 500000 the iterations.

List of solvers that use both the inner and outer hill climbers:

[General] Hill climber iterations factor

[General] N-gram factor

[General] Multiplicity weight

This value will "punish" solutions that are produced by increasing the multiplicity (ratio of unique symbols to cipher length) of the cipher text. Defaults to 0 which means this punishment will not occur. A value of 1 is a good starting point for including the punishment.

Jarl says: It follows this calculation: score/=1+((unique_symbols/cipher_length)*multiplicity_weight). The higher the multiplicity, the more the score will be reduced. It may prevent some of the solvers from inflating the score by changes that increase multiplicity. It will still do that but now there is a trade off. Secondly it can also be used as a filter, in case of using the Substitution + units solver for removing the spaces as nulls, the key size (amount of null symbols) will increase with every hill-climber iteration and eventually the multiplicity will become very high and the solver may find non-solutions that exceed the score of the underlying plain text, in that case the multiplicity weight will punish these and the best solution (according to the multiplicity weight trade off) will still be on top, sort of.

[General] Output to file

[General] Output to batch

[General] Output scores over

[General] Output improvements only

[General] Output additional stats

[General] Overwrite existing solver output

[General] Add PC-cycles to file output format

[General] Restarts

[General] Temperature

[General] Enable memory checks

[General] Enable screen size checks

[General] 8-gram memory limit

[General] 8-gram caching

[General] Add spaces to output

Note: For this feature to work, n-gram stats that consist only of the letters A through Z must be loaded. For example, any of the languages in the Wortschatz folder meet this condition.

[General] Add spaces to output iterations

[General] Letter n-gram log value cut-off

[General] Homophone weight

When this is set to 0 (the default value), the substitution solver may sometimes allow a plaintext letter to be assigned to more than one ciphertext letter. If you want this to be more strict, where only zero or one ciphertext letter assignments are permitted per plaintext letter, change the Homophone weight to 1.


Build steps

  1. Unzip the 3 folders in the root of C drive. (AZdecrypt, FbEdit and FreeBASIC-1.09.0-winlibs-gcc-9.3.0)
  2. Open FbEdit under FbEdit.
  3. Open AZdecrypt.bas.
  4. Click the green play icon to compile and run.

Next to the green play icon there is Windows GCC which can be changed to Windows GAS for much faster compilation time. But GAS builds a slower program.

Options, Build Options to change compiler flags.

Options, Path Options to change paths.

The substitution solver = sub azdecrypt_234567810g(byval tn_ptr as any ptr).

Note: To tune the compiled binary to a specific CPU architecture, use the march command line options. For example: -march=skylake. To optimize to whatever CPU you're using, use -march=native.

Stats

Unigrams

Jarl demonstrates how to extract periodicities from a transposition matrix entered in the input window, and explains Entropy vs Normalized Entropy.

Symbol n-grams

Encoding

Observations

Alphabets

Hafer shifts

Output graphs

Periodic analysis

Transposition matrix analysis

Compare input and output

Find vigenere keyword length

Find omnidirectional repeats

Find plaintext direction

Find encoding direction

Find perfect n-symbol cycles

Find n-symbol cycle types

Find n-symbol cycle patterns

Find sequential homphonic randomizations

Find post sequential homophonic row or columnar rearrangement


Notes, tips and tricks

Pangrams

AZDecrypt was not able to solve this cipher with default settings:

ABC DEFGH IJKLM NKO PEQRS KTCJ U VUWX YKZ RUGH QX IKO LFAB NFTC YKWCM VFDEKJ PEZS

The plaintext consists of two pangrams:

THE QUICK BROWN FOX JUMPS OVER A LAZY DOG PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS

Jarl's fix, under "Options -> Solvers":

  • Boost Entropy weight to 10
  • Click the "Normalize n-gram factor" button

Now it can solve the cipher.

Batch mode

In batch mode (processing multiple ciphers), under default settings, AZDecrypt will not write output files if newly cracked ciphers' scores are not higher than previously cracked ciphers' scores. To change this, set "Output improvements only" to "No" in "Options -> Solvers"

In v1.2, "Batch ciphers (non-substitution)" has been added to the File menu.

Solving "Ambiguous Caesar shift" ciphers (aka Hafer ciphers or Hafer homophonic ciphers)

Instructions and details on configuring AZDecrypt to solve these kinds of ciphers can be found here (old broken link)

The description of the encipherment system is here.

Jarlve posted a new version of AZDecrypt that can solve Hafer ciphers here.

Polyphone solver improvements

Jarl has added improvements to the polyphone solver. Details here.

Compute score for a given plaintext

Select the "Non-substitution" solver.

Before that feature was available, it was achievable like this: Use the "Substitution + crib grid" solver, click on Show cipher and then type in the letters of the plain text. Though you will have to leave one letter uncribbed for it to work.

Homophonic substitution with spaces

Sometimes a homophonic substitution cipher will encode spaces with one or more symbols.

To solve it: Go to File, Load n-grams, navigate to AZdecrypt/N-grams/Spaces and select "5-grams_english+spaces_jarlve_reddit".

Before that method was available, one way was to select the Substitution + units solver with the standard settings, Unit: symbol, Mode: Remove and perhaps Multiplicity weight: 1. Thread

Using AZDecrypt to auto-insert spaces in input

Sometimes you have a plaintext that lacks spaces and want to automatically add spaces between words (or what might seem like words). Steps:

Note: For this feature to work, n-gram stats that consist only of the letters A through Z must be loaded. For example, any of the languages in the Wortschatz folder meet this condition.

Release History

1.24 - Jan 1, 2024

https://drive.google.com/file/d/1QNhckwpBdijDViCyW16H9uO_Smd30M41/view?usp=sharing (.exe only)

New features:

Additional n-gram downloads:

7-gram beijinghouse v7: https://drive.proton.me/urls/K4QQGAVKEC#SlyrIbgegH13 (0.90 GB download -- 9.2 GB ram) 8-gram beijinghouse v7: https://drive.proton.me/urls/A4X7P9X990#uZKoxvda51Fy (3.50 GB download -- 10.5 GB ram) 9-gram beijinghouse v7: https://drive.proton.me/urls/E3FXCCVK5M#vZxvy1ZK94XN (15.4 GB download -- 88.8 GB ram)

1.23 - Apr 13, 2023

https://drive.google.com/file/d/1zenlhOqsPs6s7XKMdk9IhL7IK6nO7vUQ/view?usp=share_link

New features:

1.22 - Feb 20, 2023

https://drive.google.com/file/d/12ngl8-_hd7EvofHRKkNB7VtBNBceB2Tl/view?usp=share_link

New features:

1.21 - Apr 23, 2022

1.20 - Mar 28, 2022

https://drive.google.com/file/d/1gfdgnbuntedRPyG472QekZo-bImGR3D6/view?usp=sharing

Added a whole bunch of transpositions solvers that operate on non-subsitution ciphers:

Non-substitution (scores your plain text + detailed n-gram stats) Columnar transposition (keyed) Columnar rearrangement (keyed) Grid rearrangement (keyed) Periodic transposition (can solve a whole variety of transposition ciphers keyed or unkeyed as long as the transposition can be summarized by a limited set of periodic rules) Simple transposition (the same solver as the Substitution + simple transposition but does not perform substitution)

Also File, Batch ciphers (non-subsitution) has been added.

Additional n-gram downloads:

7-gram beijinghouse v6: https://drive.google.com/file/d/1lvh3Ih_P9OShWzQVub7wsTk8f1YtLQg8/view?usp=sharing (420 MB download) 8-gram beijinghouse v6: https://drive.google.com/file/d/1v9xvmKQoARyerV2lIK3tphdeGiw_JmZ7/view?usp=sharing (5 GB download)

1.19 - Nov 11, 2020

https://drive.google.com/file/d/1_lP82NAvj5-vzd8O33e5aggWViHd-THJ/view?usp=sharing

What's new?

Example output with the new word spacing (not perfect but pretty good):

Score: 24636.42 IOC: 0.0589 Multiplicity: 0.1882 Seconds: 0.12
Repeats: ANALYS CIPHER COULD MEAN TOBE LESS AND YST THE
PC-cycles: 490

ALTERNATIVELY AND FAR LESS GLORIOUSLY THE 
CIPHERS COULD BE THE BACKLASH OF A LUCKY 
LOW DOWN CRIMINAL WITH A TEXTBOOK ON HOW 
TO BEAT FREQUENCY ANALYSIS THE FIRST CIPHER 
WAS SOLVED BY A HUSBAND AND WIFE TEAM OF 
A MATEUR CRYSTANALYSTS OUT OF THEIR HOME 
AN ANNOYED KILLER COULD HAVE TAKEN THE RECIPE 
FOR CODE MAKING AND BE GUN CONVOLUTING IT 
UNTIL IT BECAME MEANING LESS ENOUGH TO BE 
UNBREAKABLE I FIND THE LATTER SOME

1.18 - Jul 12, 2020

https://drive.google.com/file/d/1v0nyazUTqFGKse8qAoi2FYeQASqRz152/view?usp=sharing

I made a huge effort to simplify the program in many ways and the overlapping n-gram systems have been removed. And thus, old n-grams may no longer work and 7-grams have been removed in favor of 8-grams. N-gram sizes 2 to 6 use the default system and n-gram size 8 uses beijinghouse's system.

Download 6-grams_english_beijinghouse_v6: https://drive.google.com/file/d/1aXzSQoBcQ9MXD5fcH3jvq6UbqpU3GXaQ/view?usp=sharing (200 MB) Download 8-grams_english_beijinghouse_v6: https://drive.google.com/file/d/1v9xvmKQoARyerV2lIK3tphdeGiw_JmZ7/view?usp=sharing (5 GB) Download 8-grams_english_jarlve_reddit: https://drive.google.com/file/d/1V1N0dp8iMoT0f7fz62eAvYAaOJVvLq52/view?usp=sharing (1.2 GB)

1.17 - Dec 25, 2019

https://drive.google.com/open?id=1Sw0P9N9svMlx4QNtObZ56vEtDJHk6sZ8

What's new?

Important note: I changed all normal n-grams (non-beijinghouse) to 1-byte log values. If you have any old n-grams from sizes 2 to 6 then you need to change the n-gram .ini file from "N-gram size=5" to "N-gram size=old5" to get it to work again.

1.16 - Oct 8, 2019

https://drive.google.com/open?id=1vB1G8IAeelsz6mZU6chQ7azthSv8seky

What's new?

1.15 - May 25, 2019

https://drive.google.com/open?id=1YOBOXIz6ElHd5ej48E-FA7z7FYJsVuVg

Again a huge update and I will illustrate some examples of the new functionality during the next days.

beijnghouse's new 7-grams: https://drive.google.com/open?id=1NFV-Ph6xJUsfwA8f3dGMzR35wlL47CFC

1.14 - Mar 24, 2019

What has changed?

First of all a big thank you to beijinghouse for his code contributions:

is there how to rearrange varies columns at the same time ??! I say, for example, use an argument as 3,2,4,7,8,10,17,1,5,6,9,11,12,13,14,15,16.
The text in the input window is now transposed correctly. If you now click on "Solve", the cipher will still be solved. Apparently the previously loaded cipher is still stored somewhere. If you copy the transposed cipher and insert it again, "Solve" does not lead to any result anymore (as desired). Something is obviously not updated correctly.
I've added this transposition to my solver for the next AZdecrypt release. It has solved your cipher and no results on the 340 so far. It can pick any set of dimensions and then make a horizontal or vertical split at any offset of which each part could have its own transposition (none, mirrored, flipped, columnars, diagonals).

Updated the readme.txt with added names of the people that have helped me over the years plus some links to other people's work. Let me know if I have forgotten you!

1.13 - Jan 1, 2019

What is new?

The statistics may use sigma's here and there, that is, the amount of standard deviations something is away from the mean. These will however likely not convert properly to the odds usually associated with them. They are there to give a quick indication only.

1.12 - Nov 03, 2018

What is new?

1.11 - Apr 7, 2018

What is new?

5-gram solver performance mode enabled, supported:
--------------------------------------------------------
- Substitution
- Substitution + columnar rearrangement
- Substitution + columnar transposition
- Substitution + period + nulls & skips
- Substitution + transposition
- Batch ciphers (substitution)

And the ngram file should have a [PM] next to it:

Task: none
[PM] 5-grams_english_practicalcryptography_wortschatz.txt
--------------------------------------------------------
Can be auto-enabled via Options, Solver, set (General) Use performance mode ngrams if applicable to 1.

1.09 - Dec 5, 2017

What is new?

1.08 - Sep 7, 2017

What is new?

1.07 - Aug 5, 2017

What is new?

1.06 - Jul 20, 2017

What is new?

There is a new setting (match weight) under the solver options menu, this will let you control how much polyalphabetism it targets. Which is indicated with the solutions the solver returns. For example, the solver might return a solution that includes "Match 0.80123". This means that 80.123% of the plaintext correctly matches the ciphertext and that the other 19.877% is freely assigned by the solver. Increasing the match weight (under the solver options menu) will force the solver to higher match ratios while decreasing it allows for more polyalphabetism.

1.05 - Jun 15, 2017

What is new?

1.04 - Apr 29, 2017

What is new?

The substitution + vigenère solver can solve 63 symbol 340 character homophonic substitution + vigenère ciphers with keywords up to a length of 10 (and probably much longer) without any problems but it may take a while. With homophonic substitution it is assumed that the vigenère is actual at the plaintext level. The solver is very much susceptible to nulls.

1.03 - Apr 17, 2017

What is new?

The amount of polyphones/letters per symbol for the new substitution + polyphones solver have to be set through the symbols menu (under functions).

The new create transposition matrix (under functions) requires you to open a cipher with the dimensions you want to create a matrix in first. Then left mouse click on the button-grid to set a number one by one, it increments automatically. A right mouse click draws (when possible) a horizontal, vertical or diagonal line between the last number and your new position.

1.0c - Mar 7, 2017

1.0b - Jan 27, 2017

If there are any issues please let me know and I will fix them asap. Marclean, your transposition idea can be found under Functions -> Transposition -> Offset rectangular chain.