giellalt / bugzilla-dummy

0 stars 0 forks source link

Language identification ignores xml:lang value (Bugzilla Bug 1061) #150

Closed albbas closed 10 years ago

albbas commented 13 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 1061

Date: 2011-06-16T17:14:27+02:00 From: Sjur Nørstebø Moshagen <> To: Børre Gaup <> CC: ciprian.gerstenberger, sjur.n.moshagen, tomi.k.pieski, trond.trosterud

Last updated: 2014-03-28T15:11:47+01:00

albbas commented 13 years ago

Comment 4530

Date: 2011-06-16 17:14:27 +0200 From: Sjur Nørstebø Moshagen <>

I'm trying to use our file-specific xsl to override language detection and identification, but to no avail. Here's the xsl template in the file-specific xsl file:

This is correctly applied to the temporary file before language detection:

I Orohagat

But in the final xml file there is still the wrong language:

I Orohagat

This is a quite serious bug, and makes the whole xsl language specification useless. Fortunately the language guesser is mostly right, so the problem is not very big at the moment, at least not for the stable corpus. Setting priority according to this.
albbas commented 13 years ago

Comment 4531

Date: 2011-06-16 17:15:21 +0200 From: Sjur Nørstebø Moshagen <>

The file tested was:

$GTFREE//orig/sme/admin/depts/other_files/OTP200620070025000SE_12.html

albbas commented 13 years ago

Comment 4617

Date: 2011-06-22 21:56:02 +0200 From: Trond Trosterud <>

Is this a duplicate of #1048?

albbas commented 13 years ago

Comment 4620

Date: 2011-06-22 22:37:55 +0200 From: Sjur Nørstebø Moshagen <>

(In reply to comment #2)

Is this a duplicate of #1048?

No, it is not, although they resemble each other. This bug is about adhere to explicit language coding in the xml document when doing language recognition and identification. Bug #1048 is about the process of assigning language codes that should not be allowed at all, since the assigned language code is not listed among the valid languages for the document.

Related somehow, but not the same.

albbas commented 12 years ago

Comment 5710

Date: 2012-02-03 16:28:38 +0100 From: Sjur Nørstebø Moshagen <>

Børre, I think this one is fixed now, isn't it? Close?

albbas commented 12 years ago

Comment 6587

Date: 2012-08-16 18:14:15 +0200 From: Trond Trosterud <>

Close?

albbas commented 12 years ago

Comment 6635

Date: 2012-08-27 12:23:36 +0200 From: Børre Gaup <>

This is not fixed. When language detection is on the language detector happily overrules anything that has been set earlier.

albbas commented 10 years ago

Comment 9224

Date: 2014-03-28 07:33:39 +0100 From: Sjur Nørstebø Moshagen <>

(In reply to comment #6)

This is not fixed. When language detection is on the language detector happily overrules anything that has been set earlier.

That was in 2012. I believe that this has been fixed later. If so, can this bug be closed?

albbas commented 10 years ago

Comment 9225

Date: 2014-03-28 07:52:53 +0100 From: Ciprian Gerstenberger <>

(In reply to comment #7) First, I would test it separately before closing the bug, I remember, not quite long ago, 6 months, I was trying to convert something from Romanian (pdf) and despite explicit language declaration in the xsl-file the conversion fell back to the sme-default. Yet, this might be a slightly different problem: there is no language detection for ron, as far as I know: no lang-model, no lang-detection.

albbas commented 10 years ago

Comment 9228

Date: 2014-03-28 15:11:47 +0100 From: Børre Gaup <>

This has been fixed in r91325