Closed Natkeeran closed 7 years ago
@Natkeeran this is a workable solution, however I'm concerned that if an administrator views the xml datastream that they would find the CDATA elements confusing. How about using something like an html entity encoder? http://php.net/manual/en/function.html-entity-decode.php. Just a suggestion.
@kimpham54 I'd say the approach that @Natkeeran has taken is preferred given that we can't make too many assumptions about what the kind of characters will be added to the transcript cues. For example, I could imagine the module being used for preserving technical lectures in computer science. I would hope that only more advanced users would be editing the datastream directly and thus would recognize CDATA sections. We could document this in order to make it clearer if confusion is a major concern.
@MarcusBarnes @Natkeeran I think documenting is fine, CDATA is a sound approach.
However I am a little concerned because I can imagine a scenario where you may have multiple transcript editors. Some editors may want to use software to create transcripts, some may be editing the file through the Transcript UI interface. If this is happening simultaneously, or if this requires someone to export the transcript, edit it, then replace the datastream, we need to make sure the transcript being accessed will be consistent across the board.
When I tested the following scenario:
<cue>
<speaker><![CDATA[Test Speaker one ]]></speaker>
<start><![CDATA[2]]></start>
<end><![CDATA[4]]></end>
<transcript><![CDATA[ and there are potatoes to be looked after.]]></transcript>
<translation_fr><![CDATA[et il y a des pommes de terre à soigner.]]></translation_fr>
<translation_ch><![CDATA[有土豆要照顾。]]></translation_ch>
<translation_hi><![CDATA[और आलू की देखभाल करने के बाद।]]></translation_hi>
</cue>
<cue>
<speaker>Test Speaker 6</speaker>
<start>6</start>
<end>8</end>
<transcript>and there are potatoes to be looked after.</transcript>
<translation_fr>t il y a des pommes de terre à soigner.</translation_fr>
<translation_ch>有土豆要照</translation_ch>
<translation_hi>और आलू की देखभाल करने के बाद।</translation_hi>
</cue>
Result: Before step 5, the cue without the CDATA remained without CDATA, so what I gather is that transcripts will only have CDATA elements applied if they're edited through the interface. Should we be concerned that not all transcripts will have CDATA elements? One option is to apply CDATA elements on ingest, not just if they were edited through the interface, then we'd make sure all transcripts ingested in the repository will look consistent. Either way, I'm fine if this behaviour is documented and am fine with merging this PR.
@kimpham54 CDATA is a standard way to handle special characters in XML files. We should recommend that users use CDATA to put in values. Generally, XML libraries and tools will know how to handle CDATA.
If the user edits a transcript via the interface, all the xml tier values will have CDATA, i.e below:
<?xml version="1.0" encoding="utf-8"?>
<cues>
<cue>
<speaker><![CDATA[Different Speaker]]></speaker>
<start><![CDATA[0.000]]></start>
<end><![CDATA[30.0]]></end>
<transcript><![CDATA[aasd&&sss]]></transcript>
<translation><![CDATA[This is the annotation content.]]></translation>
</cue>
<cue>
<speaker><![CDATA[Speaker2]]></speaker>
<start><![CDATA[30.0]]></start>
<end><![CDATA[60]]></end>
<transcript><![CDATA[speaker 2 transcript]]></transcript>
<translation><![CDATA[speakder 2 transcript translation]]></translation>
</cue>
<cue>
<speaker><![CDATA[Speaker 3]]></speaker>
<start><![CDATA[60.0]]></start>
<end><![CDATA[105]]></end>
<transcript><![CDATA[speaker 3 transcript]]></transcript>
<translation><![CDATA[speadker 3 transcript translation]]></translation>
</cue>
</cues>
Yes, if some files are edited via interface and some are not. There will be some inconsistency. But, I don't think this is a major issue, because the xml would still parsable in a standard compliant way.
@MarcusBarnes noted that we can document our approach, and if users pose specific issues, we can refine the solution.
I suggest we merge. @kimpham54 Thumbs up or down?
The solution works for me. Thanks
What does this Pull Request do?
This PR addresses this issue: https://github.com/digitalutsc/islandora_solution_pack_oralhistories/issues/58. It saves the values as CDATA, thus ensuring that a valid xml element is formed.
How should this be tested?