PerseusDL / canonical

This will be the base repo for all text and annotation data published in the PDL
16 stars 17 forks source link

review Smith et al for missing entities #32

Open lcerrato opened 11 years ago

lcerrato commented 11 years ago

0000666: adding Unicode character entities Description I'm working in Smith's realia and noticed that some of the things which the data entry people thought were images, are actually symbols and notations for which there are now Unicode equivalents (even if most people don't have fonts to view them or we are still in the UTF-8 charset). Most of these are either not marked at all or are marked with placeholder entities which seem to be particular to this document or to Perseus. Will it cause any problems with the transformation of these texts if I add the hexadecimal codes for the entities I recognize?

For example, we use <!ENTITY triseme SDATA "[triseme]" > but the triseme has a Unicode equivalent now (&x23d7;) and this entity doesn't seem to mean anything.