MartinThoma / hwrt

A toolset for handwriting recognition
MIT License
69 stars 17 forks source link

Segmentation #21

Open MartinThoma opened 9 years ago

MartinThoma commented 9 years ago

This is an example for segmentation:

Each symbol is in one color, different colors mean different symbols.

Features

A classifier which has to decide for two given strokes if they are in one symbol could use the following features:

MartinThoma commented 9 years ago

Segmentation by strokes seems to be possible for mathematical context.

Be aware that the order of strokes is not necessarily ordered (e.g. enlarging fraction strokes)

MartinThoma commented 9 years ago

For http://www.martin-thoma.de/write-math/view/?raw_data_id=151305 the classifier predicted the segmentation [[0,4],[1,2,5,6],[3]]:

The yellow and black segmentation does not make any sense, as it would mean there is a symbol between strokes of one symbol. This should be part of the score of a segmentation.

MartinThoma commented 9 years ago

153892

Real segmentation: [[0, 2, 16], [1], [3, 4], [5, 6], [7, 8], [9, 10, 11], [12, 13], [14, 15]] (got at place -1) Predict segmentation: [[0, 4, 6], [1, 2, 3, 5], [7], [8, 10, 11], [9, 15], [12, 13, 14], [16]] (0.00000000)

MartinThoma commented 9 years ago
[157, 2, 498, 59, 14, 33, 4, 4, 3, 1, 4, 14, 13, 0, 0, 2, 11, 277, 1, 1, 1, 1, 5, 4, 74, 1, 1, 1, 4, 2, 14, 18, 97, 1, 12, 2, 0, 1, 11, 2, 49, 1, 197, 12, 49, 2, 13, 360, 6, 5, 14, 21, 5, 15, 3, 7, 0, 9, 326, 123, 39, 7, 1, 3, 23, 4, 1, 12, 264, 63, 14, 4, 1, 2, 3, 6, 146, 8, 0, 13, 406, 3, 0, 1, 1, 3, 157, 0, 0, 158, 46, 6, 1, 33, 2, 1, 1, 0, 0, 4, 1, 4, 1, 0, 1, 5, 3, 4, 3, 0, 10, 1, 73, 11, 0, 1, 26, 9, 3, 0, 240, 1, 2, 27, 7, 1, 4, 8, 105, 0, 306, 98, 0, 1, 10, 57, 122, 0, 12, 2, 0, 485, 0, 29, 2, 17, 1, 2, 1, 312, 1, 44, 10, 13, 0, 23, 1, 37, 406, 4, 0, 2, 2, 4, 3, 0, 4, 3, 9, 0, 0, 2, 10, 20, 4, 21, 105, 0, 3, 0, 19, 0, 88, 3, 1, 3, 0, 1, 1, 13, 1, 20, 0, 1, 4, 5, 22, 1, 1, 36, 7, 0, 1, 2, 85, 38, 1, 9, 62, 2, 3, 1, 25, 6, 0, 1, 1, 0, 1, 0, 26, 9, 0, 127, 0, 0, 0, 2, 1, 10, 0, 9, 0, 7, 1, 0, 12, 2, 0, 60, 15, 0, 155, 0, 0, 14, 7, 2, 1, 0, 1, 0, 1, 15, 0, 0, 3, 12, 11, 23, 34, 0, 1, 1, 0, 2, 1, 189, 84, 0, 10, 0, 4, 3, 0, 0, 450, 0, 12, 163, 10, 3, 7, 1, 0, 4, 0, 113, 3, 49, 0, 1, 26, 1, 0, 4, 3, 49, 11, 1, 0, 0, 1, 95, 1, 76, 76, 1, 1, 1, 4, 12, 0, 0, 0, 0, 30, 17, 2, 0, 16, 15, 497, 13, 360, 181, 0, 1, 49, 54, 13, 1, 4, 2, 45, 1, 7, 1, 4, 0, 0, 2, 2, 4, 114, 8, 1, 3, 199, 1, 1, 0, 7, 0, 2, 212, 9, 10, 178, 1, 31, 1, 1, 41, 0, 0, 207, 1, 1, 173, 4, 6, 4, 1, 4, 1, 65, 0, 0, 3, 1, 184, 1, 8, 1, 5, 4, 0, 13, 10, 1, 1, 148, 2, 42, 2, 7, 1, 16, 0, 49, 1, 38, 1, 289, 3, 14, 37, 0, 0, 24, 133, 14, 4, 0, 8, 0, 0, 47, 1, 4, 1, 1, 3, 43, 104, 2, 0, 1, 28, 2, 379, 12, 57, 1, 79, 3, 7, 14, 499, 4, 2, 1, 241, 0, 0, 2, 0, 1, 0, 0, 3, 178, 52, 2, 0, 156, 269, 0, 0, 2, 3, 0, 4, 0, 51, 60, 166, 0, 0, 1, 11, 6, 3, 5, 77, 0, 0, 7, 0, 10, 4, 25, 0, 5, 96, 0, 317, 85, 7, 101, 0, 19, 332, 23, 13, 116, 37, 13, 1, 13, 0, 1, 26, 192, 7, 4, 50, 1, 2, 0, 0, 1, 86, 2, 127, 347, 0, 16, 0, 3, 7, 223, 309, 13, 36, 0, 1, 1, 1, 1, 1, 3, 4, 258, 0, 384, 0, 11, 1, 495, 317, 4, 2, 9, 202, 9, 87, 3, 265, 6, 146, 4, 173, 12, 1, 4, 103, 21, 33, 3, 13, 0, 352, 330, 51, 334, 3, 6, 0, 34, 1, 0, 0, 6, 1, 8, 8, 9, 9, 9, 0, 1, 1, 191, 1, 1, 0, 10, 22, 8, 175, 0, 1, 0, 14, 376, 1, 24, 17, 1, 26, 164, 1, 0, 5, 27, 1, 196, 0, 0, 31, 11, 494, 0, 0, 4, 3, 0, 13, 10, 47, 3, 0, 386, 11, 3, 0, 0, 0, 3, 3, 8, 0, 1, 2, 0, 30, 4, 10, 3, 10, 304, 33, 0, 1, 1, 1, 12, 0, 39, 1, 0, 0, 0, 0, 0, 1, 0, 3, 36, 0, 3, 20, 11, 4, 0, 0, 194, 347, 98, 0, 140, 445, 0, 3, 0, 3, 2, 182, 0, 1, 2, 4, 20, 1, 167, 13, 1, 0, 1, 0, 200, 0, 4, 12, 0, 68, 14, 3, 0, 0, 0, 50, 0, 4, 10, 35, 49]
2015-04-12 19:49:54,171 INFO mean: 37.82
2015-04-12 19:49:54,172 INFO median: 3.00
2015-04-12 19:49:54,172 INFO TOP-1: 0.16
2015-04-12 19:49:54,172 INFO TOP-3: 0.33
2015-04-12 19:49:54,173 INFO TOP-10: 0.48
2015-04-12 19:49:54,173 INFO TOP-20: 0.56
2015-04-12 19:49:54,173 INFO TOP-50: 0.63
2015-04-12 19:49:54,173 INFO Out of order: 72
2015-04-12 19:49:54,173 INFO Total: 943
MartinThoma commented 9 years ago

Idea:

Analyze data

Search single-stroke symbols which are not "prefixes" of other symbols. For example. o is a prefix of \oplus or l is a prefix of t; 1 and 3 are prefixes of B. Those symbols should also be "stable" in the sense that their variance in stroke count is low. Let B be the group of those basic symbols. Examples of this group might be \sum, \int, (, ), \infty, 2, 6, 8, 9 0, a, c, e, ...

Results

91 of 374 symbols have a mean stroke count in [1,1.1] with a variance < 0.3. The variance is very likely caused by wild points.

Segment

Let L be the list of strokes which should be partitioned into a list of list of strokes, where each sublist is one symbol.

  1. In order to segment a recordings, analyze each stroke as a single-stroke symbol. Try to find if a stoke is a member of B. Those members of B are called L_B ⊆ L. Let L' = L \ L_B.
  2. Hard chunking ("hard" because there is nothing probabilistic):
    1. Find the biggest symbol (by bounding box, separately for width and height)
    2. Strokes whichs bounding boxes are more than α× max_width in width or more than α× max_height away belong to separate chunks. - α should be about 2 from my gut-feeling.
    3. Does any chunk have more than 8 strokes? → chunk it again (how? with smaller α?)
  3. Now each chunk is guaranteed to have at most 8 strokes. For each chunk:
    1. The 2-stroke classifier (belongs to the same symbol or not) can use new features
      • "symbols in between"
      • mean / median / min / max dimensions (with / height) of symbols in L_B
      • width / height of the bounding box of the symbol if the two strokes belonged to to the same symbol to find a segmentation of L'.
    2. Segment chunks as before
MartinThoma commented 9 years ago

Idea: Use MST as described in http://www.ai.mit.edu/projects/natural-log/papers/matsakis-MEng-99.pdf#page=37

MartinThoma commented 9 years ago

Ideas:

MartinThoma commented 9 years ago

Other problems:

MartinThoma commented 9 years ago

Some first results by prob_sum > 0.95 and not any([el for el in bbintersections[i]]) and len(stroke) > 2 and predictions[0]['semantics'].split(';')[1] != '-'

2015-05-10 20:00:41,780 INFO TOP-1: 0.27
2015-05-10 20:00:41,782 INFO Over-segmented: 0.21
2015-05-10 20:00:41,782 INFO Under-segmented: 0.66
2015-05-10 20:00:41,782 INFO overUnder-segmented: 0.14
2015-05-10 20:00:41,782 INFO Out of order: 113
2015-05-10 20:00:41,782 INFO Total: 1856

2015-05-10 20:20:14,914 INFO TOP-1: 0.25
2015-05-10 20:20:14,915 INFO Over-segmented: 0.20
2015-05-10 20:20:14,915 INFO Under-segmented: 0.69
2015-05-10 20:20:14,916 INFO overUnder-segmented: 0.13
2015-05-10 20:20:14,916 INFO Out of order: 113
2015-05-10 20:20:14,916 INFO Total: 1856

2015-05-10 20:26:53,781 INFO TOP-1: 0.20
2015-05-10 20:26:53,782 INFO Over-segmented: 0.16
2015-05-10 20:26:53,782 INFO Under-segmented: 0.76
2015-05-10 20:26:53,782 INFO overUnder-segmented: 0.12
2015-05-10 20:26:53,782 INFO Out of order: 113
2015-05-10 20:26:53,783 INFO Total: 1856
MartinThoma commented 9 years ago

Single Stroke combinational recognizer

Classify symbols stroke-by-stroke. Lets say there are n symbols in total.

  1. Classify each stroke. Remember the scaling, x- and y- translation factors (3 factors)
  2. Train a two-stroke combination recognizer which takes 2*(n+3) input features and gives the probabilities for n classes. Problem: No way to remember which strokes were already drawn. Good: Stroke enhancements are probably not that bad.
MartinThoma commented 9 years ago

Stream segmentation

Now one of two things could have happened:

For new symbols, it is more likely that it is at the very right side of the bounding box of complete_data. It is also very likely that the new stroke did not cross any old stroke.

If the new stroke belongs to an existing symbol, then then it has to be "next" to the symbol. "next" in this context means there is not another symbol in between (though it might be around, e.g. sqrt).

MartinThoma commented 9 years ago
## 325119
  Real segmentation:    [[0, 1], [2, 3]] (got at place -1)
  Predict segmentation: [[0], [1], [2, 3]] (1.00000000)
  Segmentation took 0.0561 seconds.
  over-segmented
## 325266
  Real segmentation:    [[0], [1, 2], [3], [4, 5]] (got at place -1)
  Predict segmentation: [[0], [1, 2], [3], [4], [5]] (1.00000000)
  Segmentation took 0.0890 seconds.
  over-segmented
## 325283
  Real segmentation:    [[0, 1], [2]] (got at place -1)
  Predict segmentation: [[0, 2], [1]] (1.00000000)
  Segmentation took 0.0451 seconds.
  over-segmented
  under-segmented
## 325540
  Real segmentation:    [[0], [1, 2], [3, 4], [5], [6]] (got at place -1)
  Predict segmentation: [[0, 6], [1, 2, 3], [4], [5]] (1.00000000)
  Segmentation took 0.0978 seconds.
  over-segmented
  under-segmented
## 325545
  Real segmentation:    [[0], [1], [2, 3], [4], [5], [6, 7], [8]] (got at place -1)
  Predict segmentation: [[0], [1, 2], [3], [4], [5], [6], [7, 8]] (1.00000000)
  Segmentation took 0.1346 seconds.
  over-segmented
  under-segmented
## 2800 ################################################################################
## 325682
  Real segmentation:    [[0, 1], [2]] (got at place -1)
  Predict segmentation: [[0], [1], [2]] (1.00000000)
  Segmentation took 0.0481 seconds.
  over-segmented
## 325689
  Real segmentation:    [[0, 1], [2, 3], [4], [5, 6]] (got at place -1)
  Predict segmentation: [[0, 1, 2, 3], [4], [5], [6]] (1.00000000)
  Segmentation took 0.1113 seconds.
  over-segmented
  under-segmented
## 325792
  Real segmentation:    [[0], [1, 2], [3], [4, 5], [6], [7], [8, 9], [10]] (got at place -1)
  Predict segmentation: [[0], [1], [2], [3], [4, 5], [6], [7, 10], [8, 9]] (1.00000000)
  Segmentation took 0.1735 seconds.
  over-segmented
  under-segmented
## 325915
  Real segmentation:    [[0, 1], [2]] (got at place -1)
  Predict segmentation: [[0], [1], [2]] (1.00000000)
  Segmentation took 0.0672 seconds.
  over-segmented
2015-06-07 19:39:39,710 INFO mean: 0.00
2015-06-07 19:39:39,711 INFO median: 0.00
2015-06-07 19:39:39,711 INFO TOP-1: 0.20
2015-06-07 19:39:39,711 INFO TOP-3: 0.20
2015-06-07 19:39:39,712 INFO TOP-10: 0.20
2015-06-07 19:39:39,712 INFO TOP-20: 0.20
2015-06-07 19:39:39,712 INFO TOP-50: 0.20
2015-06-07 19:39:39,712 INFO Over-segmented: 0.12
2015-06-07 19:39:39,712 INFO Under-segmented: 0.76
2015-06-07 19:39:39,712 INFO overUnder-segmented: 0.08
2015-06-07 19:39:39,712 INFO Out of order: 172
2015-06-07 19:39:39,713 INFO Total: 2838
MartinThoma commented 9 years ago

Top 2 > 0.95 instead of top 3:

2015-06-08 20:55:56,451 INFO mean: 0.00
2015-06-08 20:55:56,465 INFO median: 0.00
2015-06-08 20:55:56,466 INFO TOP-1: 0.18
2015-06-08 20:55:56,466 INFO TOP-3: 0.18
2015-06-08 20:55:56,466 INFO TOP-10: 0.18
2015-06-08 20:55:56,466 INFO TOP-20: 0.18
2015-06-08 20:55:56,467 INFO TOP-50: 0.18
2015-06-08 20:55:56,467 INFO Over-segmented: 0.09
2015-06-08 20:55:56,467 INFO Under-segmented: 0.79
2015-06-08 20:55:56,467 INFO overUnder-segmented: 0.07
2015-06-08 20:55:56,467 INFO Out of order: 172
2015-06-08 20:55:56,468 INFO Total: 2838
MartinThoma commented 9 years ago

Top 4 > 0.95 instead of top 3:

2015-06-08 21:01:28,281 INFO mean: 0.00
2015-06-08 21:01:28,282 INFO median: 0.00
2015-06-08 21:01:28,282 INFO TOP-1: 0.22
2015-06-08 21:01:28,282 INFO TOP-3: 0.22
2015-06-08 21:01:28,282 INFO TOP-10: 0.22
2015-06-08 21:01:28,282 INFO TOP-20: 0.22
2015-06-08 21:01:28,283 INFO TOP-50: 0.22
2015-06-08 21:01:28,283 INFO Over-segmented: 0.14
2015-06-08 21:01:28,283 INFO Under-segmented: 0.73
2015-06-08 21:01:28,283 INFO overUnder-segmented: 0.09
2015-06-08 21:01:28,283 INFO Out of order: 172
2015-06-08 21:01:28,283 INFO Total: 2838
MartinThoma commented 9 years ago

Top 5 > 0.95 instead of top 3:

2015-06-08 21:07:36,789 INFO mean: 0.00
2015-06-08 21:07:36,790 INFO median: 0.00
2015-06-08 21:07:36,790 INFO TOP-1: 0.23
2015-06-08 21:07:36,791 INFO TOP-3: 0.23
2015-06-08 21:07:36,791 INFO TOP-10: 0.23
2015-06-08 21:07:36,791 INFO TOP-20: 0.23
2015-06-08 21:07:36,791 INFO TOP-50: 0.23
2015-06-08 21:07:36,791 INFO Over-segmented: 0.16
2015-06-08 21:07:36,791 INFO Under-segmented: 0.71
2015-06-08 21:07:36,791 INFO overUnder-segmented: 0.09
2015-06-08 21:07:36,792 INFO Out of order: 172
2015-06-08 21:07:36,792 INFO Total: 2838
MartinThoma commented 9 years ago

Top 6 > 0.95 instead of top 3:

2015-06-08 21:14:14,411 INFO mean: 0.00
2015-06-08 21:14:14,412 INFO median: 0.00
2015-06-08 21:14:14,412 INFO TOP-1: 0.24
2015-06-08 21:14:14,412 INFO TOP-3: 0.24
2015-06-08 21:14:14,412 INFO TOP-10: 0.24
2015-06-08 21:14:14,413 INFO TOP-20: 0.24
2015-06-08 21:14:14,413 INFO TOP-50: 0.24
2015-06-08 21:14:14,413 INFO Over-segmented: 0.17
2015-06-08 21:14:14,413 INFO Under-segmented: 0.69
2015-06-08 21:14:14,413 INFO overUnder-segmented: 0.10
2015-06-08 21:14:14,413 INFO Out of order: 172
2015-06-08 21:14:14,413 INFO Total: 2838
MartinThoma commented 9 years ago

Top 10 > 0.95 instead of top 3:

2015-06-08 21:19:49,930 INFO mean: 0.00
2015-06-08 21:19:49,931 INFO median: 0.00
2015-06-08 21:19:49,931 INFO TOP-1: 0.25
2015-06-08 21:19:49,931 INFO TOP-3: 0.25
2015-06-08 21:19:49,932 INFO TOP-10: 0.25
2015-06-08 21:19:49,932 INFO TOP-20: 0.25
2015-06-08 21:19:49,932 INFO TOP-50: 0.25
2015-06-08 21:19:49,932 INFO Over-segmented: 0.19
2015-06-08 21:19:49,932 INFO Under-segmented: 0.67
2015-06-08 21:19:49,932 INFO overUnder-segmented: 0.10
2015-06-08 21:19:49,932 INFO Out of order: 172
2015-06-08 21:19:49,933 INFO Total: 2838
MartinThoma commented 9 years ago

Top 20 > 0.95 instead of top 3:

2015-06-08 22:33:33,301 INFO mean: 0.00
2015-06-08 22:33:33,302 INFO median: 0.00
2015-06-08 22:33:33,302 INFO TOP-1: 0.26
2015-06-08 22:33:33,302 INFO TOP-3: 0.26
2015-06-08 22:33:33,303 INFO TOP-10: 0.26
2015-06-08 22:33:33,303 INFO TOP-20: 0.26
2015-06-08 22:33:33,303 INFO TOP-50: 0.26
2015-06-08 22:33:33,303 INFO Over-segmented: 0.20
2015-06-08 22:33:33,303 INFO Under-segmented: 0.66
2015-06-08 22:33:33,303 INFO overUnder-segmented: 0.11
2015-06-08 22:33:33,303 INFO Out of order: 172
2015-06-08 22:33:33,304 INFO Total: 2838
./segmentation.py  284,03s user 3,10s system 96% cpu 4:56,26 total
MartinThoma commented 9 years ago

All instead of cropping at top 3:

2015-06-08 22:46:18,547 INFO mean: 0.00
2015-06-08 22:46:18,548 INFO median: 0.00
2015-06-08 22:46:18,549 INFO TOP-1: 0.27
2015-06-08 22:46:18,549 INFO TOP-3: 0.27
2015-06-08 22:46:18,549 INFO TOP-10: 0.27
2015-06-08 22:46:18,549 INFO TOP-20: 0.27
2015-06-08 22:46:18,550 INFO TOP-50: 0.27
2015-06-08 22:46:18,550 INFO Over-segmented: 0.20
2015-06-08 22:46:18,550 INFO Under-segmented: 0.65
2015-06-08 22:46:18,550 INFO overUnder-segmented: 0.11
2015-06-08 22:46:18,550 INFO Out of order: 172
2015-06-08 22:46:18,552 INFO Total: 2838
./segmentation.py  299,81s user 3,90s system 95% cpu 5:18,39 total
MartinThoma commented 9 years ago

top 1 > 0.95 instead of top 3:

2015-06-08 23:04:08,941 INFO mean: 879844.96
2015-06-08 23:04:08,942 INFO median: 1000000.00
2015-06-08 23:04:08,942 INFO TOP-1: 0.12
2015-06-08 23:04:08,943 INFO TOP-3: 0.12
2015-06-08 23:04:08,943 INFO TOP-10: 0.12
2015-06-08 23:04:08,943 INFO TOP-20: 0.12
2015-06-08 23:04:08,943 INFO TOP-50: 0.12
2015-06-08 23:04:08,944 INFO Over-segmented: 0.01
2015-06-08 23:04:08,944 INFO Under-segmented: 0.88
2015-06-08 23:04:08,944 INFO overUnder-segmented: 0.01
2015-06-08 23:04:08,944 INFO Out of order: 172
2015-06-08 23:04:08,944 INFO Total: 2838
MartinThoma commented 9 years ago

All

2015-06-08 23:12:56,249 INFO mean: 734672.30
2015-06-08 23:12:56,250 INFO median: 1000000.00
2015-06-08 23:12:56,250 INFO TOP-1: 0.27
2015-06-08 23:12:56,250 INFO TOP-3: 0.27
2015-06-08 23:12:56,251 INFO TOP-10: 0.27
2015-06-08 23:12:56,251 INFO TOP-20: 0.27
2015-06-08 23:12:56,251 INFO TOP-50: 0.27
2015-06-08 23:12:56,251 INFO Over-segmented: 0.20
2015-06-08 23:12:56,251 INFO Under-segmented: 0.65
2015-06-08 23:12:56,251 INFO overUnder-segmented: 0.11
2015-06-08 23:12:56,252 INFO Out of order: 172
2015-06-08 23:12:56,252 INFO Total: 2838
X-Ryl669 commented 6 years ago

What about using a varying size bounding box ? Physically & historically, each symbol are separated because else it would become unreadable. I think it's a good idea to compute a bounding box while a symbol is being drawn whose size is adjusted by the current stroke length. When the pen is lifted, one has to assume to either "start a new symbol" or "concatenate existing symbol". This decision can be based on a bounding box whose shape can be trained, but with a size that's dynamic. For example, when drawing the "integral" symbol, the bounding box is an horizontally thin and vertically tall rectangle that would be recognized by the network, but the size itself is extracted from the current symbol's size. When a new stoke is produced, and it intersects a previous stroke's bounding box, you can likely concatenate it with the previous symbol.