giellalt / bugzilla-dummy

0 stars 0 forks source link

Boundcorpus does not convert (Bugzilla Bug 969) #11

Closed albbas closed 13 years ago

albbas commented 13 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 969

Date: 2011-03-13T10:59:22+01:00 From: Trond Trosterud <> To: Børre Gaup <> CC: ciprian.gerstenberger, sjur.n.moshagen, trond.trosterud

Last updated: 2011-05-02T13:46:55+02:00

albbas commented 13 years ago

Comment 3817

Date: 2011-03-13 10:59:22 +0100 From: Trond Trosterud <>

Last week, I needed sma text, but found the sma boundcorpus had so many conversion errors. My version was, however old, and I hoped for a newer and better conversion, and did "make clean". The result, for boundcorpus, whas that I now am left with nothing. Here is my victorio:

Oppdatert til revisjon 2913. boundcorpus$rm -f tmp/ boundcorpus$time make convert2xml.pl --shallow goldstandard/orig Processing files |||||||||||||||||||||||||||||||||||||||||||||| 46 Processing finished 46 files processed, 46 errors among them The errors were distributed like this: add_error_markup 0 0% of errors checkxml_after_faulty 0 0% of errors convert2xml 0 0% of errors xsl 1 2% of errors character_encoding 0 0% of errors intermediate 12 26% of errors faulty_chars 32 70% of errors text_categorization 0 0% of errors checkxml_after_checklang 1 2% of errors checklang 0 0% of errors To find which files caused the errors, do the command grep "Conversion failed" tmp/.log convert2xml.pl --shallow orig Processing files ||||.||||||||||||make: *** [all] Error 1

real 5m12.031s user 5m0.341s sys 0m3.518s

albbas commented 13 years ago

Comment 3893

Date: 2011-04-22 17:31:46 +0200 From: Trond Trosterud <>

Obsolete (though by no means resolved). Moving to fresher bug reports.

albbas commented 13 years ago

Comment 3935

Date: 2011-04-26 09:19:19 +0200 From: Sjur Nørstebø Moshagen <>

If it isn't resolved, why close it? At least create a super-bug that contains all other corpus conversion bugs, including this one and all newer ones. Then it is easier to follow the development. Reopened until superbug is created.

albbas commented 13 years ago

Comment 3955

Date: 2011-04-28 23:47:26 +0200 From: Trond Trosterud <>

I reported that boundcorpus did not convert at all. Now boundcorpus has 21000 files: boundcorpus$ll converted// converted/// converted////*|wc -l 21606

I thus suggest the assignee closes it.

albbas commented 13 years ago

Comment 3991

Date: 2011-04-30 09:44:10 +0200 From: Trond Trosterud <>

Bug #946 has been marked as a duplicate of this bug.

albbas commented 13 years ago

Comment 4010

Date: 2011-05-02 13:46:55 +0200 From: Børre Gaup <>

boundcorpus/orig/sma gets converted as it should now