Closed albbas closed 13 years ago
Date: 2011-02-19 09:33:03 +0100
From: Trond Trosterud <
~/freecorpus$ccat -l sme -r converted/sme/ |preprocess --abbr=~/gtsvn/gt/sme/bin/abbr.txt |usme|lookup2cg|vislcg3 -g ~/gtsvn/gt/sme/bin/sme-dis.bin > ~/gtsvn/gt/sme/dev/analyse/free.1491.dis VISL CG-3 Disambiguator version 0.9.7.6599 Codepage: default UTF-8, input UTF-8, output UTF-8, grammar UTF-8 Info: Binary grammar detected. 0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100% Parsing grammar took 0.268409 seconds. Grammar has 17 sections, 0 templates, 3527 rules, 3930 sets, 951 c-tags, 8719 s-tags. 34 rules cannot be skipped by index. Malformed UTF-8 character (fatal) at /Users/trond/gtsvn/gt/script/preprocess line 162, <> line 212692. ~/freecorpus$
Date: 2011-02-19 09:37:35 +0100
From: Trond Trosterud <
The freecorp contains 7.7 mill, the analysis halted at some 6.2
~/freecorpus$ccat -l sme -r converted/sme/ | wc -w 7733158
~/freecorpus$cat ~/gtsvn/gt/sme/dev/analyse/free.1491.dis|grep '^\"'|wc -l 6246523
Date: 2011-04-22 16:51:45 +0200
From: Trond Trosterud <
Obsolete report.
Date: 2011-04-26 09:11:21 +0200
From: Sjur Nørstebø Moshagen <
It isn't obsolete. It is either fixed or not. Reopened until a test is provided documenting it is fixed.
Don't close a bug before discussing it with the asignee or other stakeholders, or providing tests that documents that it has been fixed. Being old doesn't mean being irrelevant or obsolete.
Date: 2011-04-30 09:44:10 +0200
From: Trond Trosterud <
This bug has been marked as a duplicate of bug #969
Date: 2011-04-30 09:57:08 +0200
From: Trond Trosterud <
This bug has been marked as a duplicate of bug #878
Date: 2011-05-02 12:42:49 +0200
From: Børre Gaup <
This is fixed. convert2xml.pl has a check that guards againt invalid utf8.
This issue was created automatically with bugzilla2github
Bugzilla Bug 946
Date: 2011-02-19T09:33:03+01:00 From: Trond Trosterud <>
To: Børre Gaup <>
CC: ciprian.gerstenberger, sjur.n.moshagen, tomi.k.pieski, trond.trosterud
Last updated: 2011-05-02T12:42:49+02:00