BMR59 / Portu-WhatsApp

Communications between Portuguese L2 Speakers and Native Speakers via WhatsApp
1 stars 1 forks source link

Reserved Characters #5

Open BMR59 opened 6 years ago

BMR59 commented 6 years ago

I'm back with another question! So, after uploading part of our corpus to the website, we've realized that any of the letters with accents on them (õ or ê, for exmaple) do not show up correctly and need to be replaced with HTML codes such as: & otilde ; . The issue is, on oxYgen, it flags these HTML codes and marks them and says: "The entity "ecirc" was referenced, but not declared." Do I need to put something in the header or elsewhere to recognize these? Did I explain the situation good enough?

ghbondar commented 6 years ago

@BMR59 All of these characters, including many emojis(!) may be displayed in HTML through entering the Unicode, well, code... similar to how we sometimes need to use escape characters in XML. see here: https://www.w3schools.com/charsets/ref_utf_latin1_supplement.asp

BMR59 commented 6 years ago

@ghbondar Should I not be coding these directly into my TEI? For example, I've replaced all the ã with " & atilde ; " but they flag as being referenced but not declared - this does not happen with the codes for &, <, and >

ghbondar commented 6 years ago

@BMR59 at any rate, they will need to be output into your html. @ebeshero will know how to use them with TEI.

ebeshero commented 6 years ago

@BMR59 @ghbondar In most cases you don't really need to use those entity codes--instead, try getting a fresh copy of the character from either a word processed document or the web. For example, when I enter this in an XML file: <stuff>õ</stuff>, I'm not seeing the problem you described, and nor do I see it when I open a TEI file (like one from the Dickinson project, and plug in

<author>Emily õ Dickinson</author>

That's because I've picked up a screen copy of the UTF character.

The issue with the website has to do with whether it is set to display utf-8 characters. @ghbondar is right that this is a problem with the transformation to html, but what you need is very simple. Check that you have a <meta> declaration set in the <head> element of your HTML that reads:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

You can produce this line in your XSLT output, I think, with this statement in your XSLT file:

 <xsl:output method="xhtml" encoding="utf-8" doctype-system="about:legacy-compat"
        omit-xml-declaration="yes"/>

Does that help?

ebeshero commented 6 years ago

@BMR59 To be clear, I don't think you need or want to replace characters like õ in your XML file--I don't think that's the problem. I think the problem is in whether your web browser is rendering the UTF-8 characters on the other side of an XSLT transformation, and to control that you want that meta statement in place.

BMR59 commented 6 years ago

@ebeshero Ok thank you! We will try adding one of those declarations to the html and see how it works out @pab124 @ttb11