Closed paul90 closed 6 years ago
Good point. I've thought about this on occasion but haven't been annoyed to action. Both client and server use the same code. A good place to apply a limit would be in the synopsis.coffee return statement. github
return synopsis.substring 0, 140*4
I choose 140*4 as a reasonable limit since that is twice Twitter's newly generous upper bound.
Here is a neat unix command that will compute a distribution of synopsis lengths working with 10 character bins. I apply this to my own sites where I try to set a good example.
curl -s $site/system.sitemap.json | jq '.[].synopsis|length/10|floor' | sort -n | uniq -c
For site=ward.asia.wiki.org we find nearly all fit within one half this limit.
Note: 4 2 means 4 pages had a synopsis length in the range 20-29 characters.
4 2
2 3
5 4
4 5
3 6
10 7
17 8
13 9
12 10
18 11
11 12
19 13
15 14
24 15
18 16
16 17
15 18
13 19
8 20
18 21
8 22
12 23
8 24
7 25
6 26
3 27
8 28
5 29
3 30
3 31
1 32
3 33
1 34
2 38
1 39
For site=ward.bay.wiki.org I seem to run on a bit longer on occasion with one synopsis clipped.
1 2
3 3
2 4
2 5
4 7
1 8
10 9
6 10
5 11
6 12
11 13
9 14
17 15
24 16
13 17
11 18
21 19
16 20
23 21
14 22
13 23
13 24
15 25
14 26
11 27
14 28
10 29
9 30
15 31
11 32
14 33
6 34
5 35
6 36
7 37
5 38
8 39
1 40
2 41
1 42
2 43
2 44
1 45
5 46
1 47
1 48
1 49
1 50
2 51
1 52
1 58
Closed by fedwiki/wiki-client#201
Currently the first paragraph is used as the synopsis in the sitemap. There appears to be some authors that are creating pages with a long first paragraph, that contains the entire page.
To protect against sitemaps becoming over large by limiting the length of the synopsis.
Long term we probably should remove the synopsis, and move to a different search mechanism.