Closed albbas closed 10 years ago
Date: 2014-01-24 16:25:17 +0100
From: Ciprian Gerstenberger <
In the current state the output from the analysis is used both for compiling the corpus for KORP
and for linguists' work using cat, ccat, etc.
Yet the XML-entities in the
XSLtemplate 1.19 ; file-specific xsl $Revision: 1.1 $; common.xsl $Revision: 68074 $; "<Sámi_Radio>" "Sámi_Radio" MWE N <sme> Prop Sem/Org Sg Gen "<ođđasat>" :
This is not useful.
The
Google with "output CDATA xsl xml" for more info. For instance: http://www.w3schools.com/xsl/el_output.asp
This way, Børre doesn't need to maintain to different pipelines and the old data format can be easily extractet from the xml files enriched with meta-data.
Date: 2014-01-31 11:48:45 +0100
From: Børre Gaup <
Jeg tror den beste måten å plukke ut denne dataen er ved å bruke plukke ut teksten i
Noe á la dette burde være mulig?
Date: 2014-01-31 15:44:17 +0100
From: Trond Trosterud <
Og korfor trur du det er betre enn å bruke CDATA?
Date: 2014-01-31 16:18:49 +0100
From: Ciprian Gerstenberger <
Nei, det er ikke det. Jeg har snakket med Børre og har forklart problemet. Egentlig er løsninga alerede i skripten her:
Det er bare å endre pipelinen fra analysen, en helt minimal endring.
Date: 2014-02-03 19:47:48 +0100
From: Børre Gaup <
I think we'll just keep it the way it is, the debugging is not hampered by the fact that the content of the analysis element contains text instead of cdata.
Date: 2014-02-03 20:27:00 +0100
From: Trond Trosterud <
To repeat the issue: Cip: we have xml entities instead of < and >: "<Sámi_Radio>" "Sámi_Radio" MWE N <sme> Prop Sem/Org Sg Gen Cip: this will be rendered correctly if we define it as CDATA instead of as text. Cip: Bonus: one pipeline, not two.
Børre: I think text is better than CDATA.
Trond: I miss some explict pro vs. con here:
1 - Is it true that content-as-text gives two pipelines, but context-as-CDATA gives one? 2 - is ccat as fast as cat? 3 - are there other issues than 1, 2 relevant for the choice of text vs. CDATA?
Date: 2014-02-03 20:33:22 +0100
From: Ciprian Gerstenberger <
Yes, I agree. I have already a script to convert the data as needed. So a small step on my pipeline is no problem. I do a proper test anyhow, so it is quite ok with me. Børre, you can just close this bug.
(In reply to comment #4)
I think we'll just keep it the way it is, the debugging is not hampered by the fact that the content of the analysis element contains text instead of cdata.
Date: 2014-02-03 20:36:25 +0100
From: Børre Gaup <
And if you run ccat on the analysed files, it spits out clean text versions of them.
This issue was created automatically with bugzilla2github
Bugzilla Bug 1802
Date: 2014-01-24T16:25:17+01:00 From: Ciprian Gerstenberger <>
To: Børre Gaup <>
CC: ciprian.gerstenberger, lene.antonsen, sjur.n.moshagen, trond.trosterud
Last updated: 2014-02-03T20:36:25+01:00