clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
42 stars 52 forks source link

CZ: audio file path does not correspond to AudioPSP 24.01 #849

Closed matyaskopp closed 7 months ago

matyaskopp commented 7 months ago

the current value is url="2013ps/audio/2016/10/27/2016102714281442.mp3"

but it should be url="audio/psp/2016/10/27/2016102714281442.mp3"

so the data from this record will be possible to use:

Kopp, Matyáš, 2024, AudioPSP 24.01: Audio recordings of proceedings of the Chamber of Deputies of the Parliament of the Czech Republic, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University,

this script fixes it in ParCzech:

  <xsl:template match="tei:recording[@type='audio']/tei:media/@url">
    <xsl:attribute name="url" select="replace(.,'^[0-9]*ps/audio/','audio/psp/')"/>

but I believe it is safe to use regex on XML, s/url="[0-9]*ps\/audio\//url="audio\/psp\//

@TomazErjavec, should I do it and insert the fix to my tantra-home? or will you process it yourself?

TomazErjavec commented 7 months ago

@TomazErjavec, should I do it and insert the fix to my tantra-home?

Yes please, and let me know when done and what I should do.

matyaskopp commented 7 months ago

I have used data from


and place the result here:


the Czech folders can be overwritten in Source-TEI

TomazErjavec commented 7 months ago

Done! Will process it as soon as the q empties.