chozelinek / europarl

Toolkit to compile a comparable/parallel corpus from European Parliament proceedings
MIT License
15 stars 4 forks source link

clean role attribute for element intervention #1

Closed chozelinek closed 7 years ago

chozelinek commented 7 years ago

After using proceedings_xml.py, there are values for intervention's attribute role which are not only the relevant information but punctuation and more. Examples:

<intervention speaker_id="photo_generic" name="Algirdas Šemeta" is_mep="True" mode="spoken" role="Member of the Commission.">
<intervention speaker_id="photo_generic" name="László Kovács" is_mep="True" mode="spoken" role="Member of the Commission. −">
<intervention speaker_id="photo_generic" name="Vladimír Špidla" is_mep="True" mode="spoken" role="Member of the Commission. – (CS)">

Solution?

Modify proceedings_xml.py before line 341 (self.intervention_to_xml(x_section, s_intervention)) to clean s_intervention['role'].