One of the benefits of devilution for me is that it will be way easier to translate the game to other languages (particularly latin ones such as spanish, french and portuguese) - no more of the hex editing brave translators had to deal with before.

But, well, there was and there is still a problem with some special characters. They don't show up properly.

This may not sound high priority but it doesn't look very hard - I just don't know the right "button" to press and solve it all. It's important to me as a brazilian fan of Diablo and author of a Diablo 2 translation to portuguese, so I'm willing to work hard on this (I've been already), I just need a little guidance.

Here's some of what I have so far:

The font files for Diablo 1 (inside the game's .mpq) have 256 slots and do contain the special characters I need, and in a sequence that matches the UTF-8 encoding.
However, typing those characters in the game strings from the source code results in "unaccented" versions in-game. For instance, "Coração" becomes "Coracao" (I have to change the encoding of the files with accented characters to ISO-8859-1 for this to happen, otherwise the accented characters are transformed into some weird two-character thing, which is also "simplified" in-game, such as in tis real example: "Ancião" becoming "AnciÃ£o" and being shown "AnciAUo" in-game).
With all that said, I'm led to believe that somewhere in the process there's something purposefully "simplifying" the characters (maybe doing things like this one: https://stackoverflow.com/questions/4015879/removing-diacritic-symbols-from-utf8-string-in-c ). I suspect that this is happening either on the extremely confusing functions PrintUString and CPrintString, the .bin files for the Diablo font inside the .mpq (hex editing them didn't reveal me anything), or somewhere else entirely (like something in the compiling step).

Anyway, I would be very glad if any of the contributors could help me a little bit. Specially with some insight into the PrintUString and CPrintString functions. I don't have much experience with C/C++ which is also somewhat of a barrier.

The main issue is that the font does not include graphic representations for all characters of the latin alphabet.

Specifically, the file data/bigtgold.cel contains graphic representations for 55 characters; namely A-Z, 0-9 and special characters. The same is true for data/medtexts.cel.

The file ctrlpan/smaltext.cel contains graphic representations for 67 characters; namely A-Z, 0-9 and special characters (including a few additional special characters).

In general, you do these steps:

ASCII character code -> font index
Font index -> frame number
Font index -> frame width

For reference, these are the main global variables to keep in mind:

ASCII character code -> font index

http://sanctuary.github.io/notes/#address/0x47954C

/// address: 0x47954C
///
/// font_index_from_ascii maps ASCII character code to font index, as used by the
/// small, medium and large sized fonts; which corresponds to smaltext.cel,
/// medtexts.cel and bigtgold.cel respectively.
int8_t font_index_from_ascii[256];

smaltext.cel

Font index -> frame number

http://sanctuary.github.io/notes/#address/0x479424

/// address: 0x479424
///
/// smaltext_frame_from_font_index maps from font index to smaltext.cel frame
/// number.
int8_t smaltext_frame_from_font_index[127];

Font index -> frame width

http://sanctuary.github.io/notes/#address/0x4794A4

/// address: 0x4794A4
///
/// smaltext_character_width_from_frame maps from smaltext.cel frame number to
/// character width. Note, the character width may be distinct from the frame
/// width, which is 13 for every smaltext.cel frame.
int8_t smaltext_character_width_from_frame[68];

medtexts.cel

Font index -> frame number

http://sanctuary.github.io/notes/#address/0x47F078

/// address: 0x47F078
///
/// medtexts_frame_from_font_index maps from font index to medtexts.cel frame
/// number.
int8_t medtexts_frame_from_font_index[127];

Font index -> frame width

http://sanctuary.github.io/notes/#address/0x47F0F8

/// address: 0x47F0F8
///
/// medtexts_character_width_from_frame maps from medtexts.cel frame number to
/// character width. Note, the character width may be distinct from the frame
/// width, which is 22 for every medtexts.cel frame.
int8_t medtexts_character_width_from_frame[56];

bigtgold.cel

Font index -> frame number

http://sanctuary.github.io/notes/#address/0x47A48C

/// address: 0x47A48C
///
/// bigtgold_frame_from_font_index maps from font index to bigtgold.cel
/// frame number.
int8_t bigtgold_frame_from_font_index[127];

Font index -> frame width

http://sanctuary.github.io/notes/#address/0x47A50C

/// address: 0x47A50C
///
/// bigtgold_character_width_from_frame maps from bigtgold.cel frame number to
/// character width. Note, the character width may be distinct from the frame
/// width, which is 46 for every bigtgold.cel frame.
int8_t bigtgold_character_width_from_frame[56];

@mewmew you made my day! I'm going to try to "expand" those .cel files and update the respective codes. I'll let you know about my progress.

Just for the record, I was looking at the wrong font files to begin with (the .pcx ones in ui_art/ inside the .mpq, which I can see now are only used in, well, the UI, which indeed supports special characters as I've seen in some translations for the game). The good news is that even though they were "wrong" for what I was looking for, they will be a great source to extract missing characters in my attempt to expand smaltext.cel, medtexts.cel and bigtgold.cel since they all seem to be similar renderings of the Exocet typeface.

In order to support extended characters, especially Kanji, we'd need to expand text interpretation to a short instead of byte. The Playstation port contains language settings for French, Spanish, German, Swedish, and Japanese IIRC. The code works a little different, fetching the string based on ID to support them. It also contains the graphic subsets. Perhaps we could rip the data from there?

Just for the record, I was looking at the wrong font files to begin with (the .pcx ones in ui_art/ inside the .mpq, which I can see now are only used in, well, the UI, which indeed supports special characters as I've seen in some translations for the game). The good news is that even though they were "wrong" for what I was looking for, they will be a great source to extract missing characters in my attempt to expand smaltext.cel, medtexts.cel and bigtgold.cel since they all seem to be similar renderings of the Exocet typeface.

Wow, that is just brilliant to hear! Then the graphical representation for the missing characters can actually be added to the game.

It also contains the graphic subsets. Perhaps we could rip the data from there?

Would be cool, do you know what format it is stored in?

There are so many interesting file formats on the PSX release, I wonder how many have been reversed so far.

[ ] .amp
[ ] .bin
[ ] .bnk
[ ] .bof
[ ] .cnf
[ ] .dat
[ ] .def
[ ] .dir
[ ] .dun
[x] .eng
[x] .fre
[x] .ger
[ ] .gfx
[ ] .hdr
[ ] .inf
[x] .jap
[ ] .lgh
[ ] .map
[ ] .min
[ ] .mov
[ ] .out
[ ] .pak
[ ] .raw
[ ] .sol
[x] .swe
[ ] .sym
[ ] .til
[ ] .tp
[ ] .vag
[ ] .zzz

The files I checked are the raw text files for the corresponding language. There are raw UTF-16 header files stored on the PSX beta release, that contain definitions and strings for all the languages. I figured I should upload this along with the symbolic info

There's also a few header source files for monster and quest information. Juicy stuff!

@mewmew you made my day! I'm going to try to "expand" those .cel files and update the respective codes. I'll let you know about my progress.

To extract the frames of the .cel files, install Go, and run:

export GOPATH=/tmp/go
export PATH=$GOPATH/bin:$PATH
go get -u github.com/sanctuary/formats/...
go get -u github.com/sanctuary/mpq
mkdir output
cd output
cp /path/to/diabdat.mpq .
mpq -dir diabdat -m diabdat.mpq
cel_dump ctrlpan/smaltext.cel data/bigtgold.cel data/medtexts.cel
# cel_dump: Converting "ctrlpan/smaltext.cel".
# cel_dump: Converting "data/bigtgold.cel".
# cel_dump: Converting "data/medtexts.cel".

You can also use the cel_dump tool to dump all CEL files. Prior to that, patch the borken files using mpqfix as follows.

go get -u github.com/mewrnd/blizzconv/cmd/mpqfix
mpqfix -mpqdump diabdat
# Patching "levels/l1data/banner2.dun".
# Patching "monsters/darkmage/dmagew.cl2".
# Patching "monsters/unrav/unravw.cel".
cel_dump -a

In order to support extended characters, especially Kanji, we'd need to expand text interpretation to a short instead of byte. The Playstation port contains language settings for French, Spanish, German, Swedish, and Japanese IIRC. The code works a little different, fetching the string based on ID to support them. It also contains the graphic subsets. Perhaps we could rip the data from there?

Sounds very promising! Especially after what happened in my attempt with the cel files:

I managed to create an extended smaltext.cel (just added the character "ã", using TDG and the oldest MS Paint I could find), inserted it into a corresponding "ctrlpan/" folder inside the Patch_rt.mpq.

And, of course, I've updated the global variables accordingly (boring details ahead): mapped "ã", the 228th entry in font_index_from_ascii, to number 128; inscreased "smaltext_frame_from_font_index" size to 128 both in control.cpp and control.h and mapped the 128th element to the newly added frame 68. Then I increased the size of smaltext_character_width_from_frame to 69, and added an entry with the desired width for character 68 (which was 8, similar to "a"); tl:dr: I doubt the problem was in the code.

Yes, there was a problem :/.

The game loads ok, then nothing wrong with the main menu as well. But as soon as I start/load an actual game, this shows up:

Here is my modified smaltext.cel. smaltext.zip

I don't know if I can post the original for comparison, but, as a reminder, it can be found in diabdat.mpq/ctrlpan

Viewing the hex code of both files reveals very different codes. Way more different than I imagine they should be. I don't know what to look for now :/

@mewmew do you think that extracting the frames using your instruction could help with this issue somehow? To what format are they extracted to?

For reference, here are some technical informations about the .cel file format, in case it offers some insight:

I also found another project that handles Diablo 1 file formats that could help as well: https://github.com/doggan/diablo-file-formats . I see @mewmew helped this project so that's great news! The thing I need is kind of the opposite process: getting some "convenient data structure" and parsing it into a .cel file that really works in-game. Doesn't sound impossible.

@maristane The issue you're receiving has to due with a newer MPQ format being used. You likely added the file using MPQ editor 32/64? Use WinMPQ instead, it has support for the older format.

@mewmew do you think that extracting the frames using your instruction could help with this issue somehow? To what format are they extracted to?

They are extracted to PNG. To convert them back to CEL, either find a tool that does this, or write one based on the CEL format, as documented in https://github.com/sanctuary/formats/blob/79024aeec0bf00480c47995ce7ad430474545a3f/image/cel/cel.go#L5

I also found another project that handles Diablo 1 file formats that could help as well: https://github.com/doggan/diablo-file-formats . I see @mewmew helped this project so that's great news! The thing I need is kind of the opposite process: getting some "convenient data structure" and parsing it into a .cel file that really works in-game. Doesn't sound impossible.

Just as a heads up. https://github.com/doggan/diablo-file-formats is based on https://github.com/mewrnd/blizzconv which works, but uses a heuristic (e.g. [1] to determine the CEL frame types).

The blizzconv project has been superceded by https://github.com/sanctuary/formats which determines the CEL frame types based on information contained within the MIN files (e.g. [2] and [3]), thus known to be correct.

@maristane The issue you're receiving has to due with a newer MPQ format being used. You likely added the file using MPQ editor 32/64? Use WinMPQ instead, it has support for the older format.

Oh, that did the trick, hahaha. But now the game just crashes whenever the font smaltext.cel was supposed to appear. Actually there are some cases when it seems to be using the unaltered smaltext.cel from diabdat.mpq and doesn't crash, but I'm still trying to figure out how.

Anyway, there seems to be something really wrong with my .cel file created with TDG (TheDarkGraphics). I'll make some more tests with the exporting options and try some other programs like Cel Maker (it seemed very weird and buggy at first but i'll give it another try). If everyting fails, I'll take a deeper dive into the .cel format to try and make my own converter. @mewmew thanks for the informations, they'll be gold if/when I take this path of creating a converter to .cel.

Aaaand there you go:

That's officially a new, 68th frame for smaltext.cel, not a replacement, officially mapped by extended and adjusted font-mapping arrays.

In other words, we did it guys!!! I'm SO EXCITED I need to drop a brazilian mistyped laugh: ahusahsuahusas. Now it's just a lot of manual work to get the rest of the characters (for latin languages at least). I can see the character's positioning and tweaking will be an issue, but that's included in the "manual work" part. I still can't believe it's done :,).

Just for the record, here's what I did: I took the .cel frames, including the new one, and instead of combining them with TDG, I used Cel Maker to "compile a new animation". I tested literally all exporting options combinations, the one that worked was 1) unchecked "has header" and 2) hitting the "remove frame headers" button. Cel Maker seems to work very well for decompiling/compiling .cels. But for generating editable .bmps the TDG did better for me.

Wow @maristane! That's really great work! Very happy the information turned out to be helpful.

Cheers, /u

Thank you so much guys!

I've finished the work on the "smaltext" font. It now supports everything in UTF-8 that is not currency or math related.

For anyone who wants to test it, here is the updated smalltext.cel file: utf8-smaltext_v2.03.zip

Here are the mappings and the char widths I'm using in control.cpp:

unsigned char fontframe[161] =
{
    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
    0,   0,   0,  54,  44,  57,  58,  56,  55,  47,
   40,  41,  59,  39,  50,  37,  51,  52,  36,  27,
   28,  29,  30,  31,  32,  33,  34,  35,  48,  49,
   60,  38,  61,  53,  62,   1,   2,   3,   4,   5,
    6,   7,   8,   9,  10,  11,  12,  13,  14,  15,
   16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
   26,  42,  63,  43,  64,  65,   0,   1,   2,   3,
    4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
   14,  15,  16,  17,  18,  19,  20,  21,  22,  23,
   24,  25,  26,  40,  66,  41,  67,  68,  69,  70, 
   71,  72,  73,  74,  75,  76,  77,  78,  79,  80,
   81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
   91,  92,  93,  94,  95,  96,  97,  98,  99, 100,
   101
};
unsigned char fontkern[102] =
{
    8,  10,   7,   9,   8,   7,   6,   8,   8,   3,
    3,   8,   6,  11,   9,  10,   6,   9,   9,   6,
    9,  11,  10,  13,  10,  11,   7,   5,   7,   7,
    8,   7,   7,   7,   7,   7,  10,   4,   5,   6,
    3,   3,   4,   3,   6,   6,   3,   3,   3,   3,
    3,   2,   7,   6,   3,  10,  10,   6,   6,   7,
    4,   4,   9,   6,   6,  12,   3,   7,   3,   6,
    10,  10, 10,  10,  10,  10,  10,   9,   7,   7,
    7,   7,   3,   3,   4,   4,   8,   9,  10,  10,
    10, 10,  10,  10,  11,  11,  11,  11,  10,  10,
    7,   8
};
unsigned char fontidx[256] =
{
    0,   1,   1,   1,   1,   1,   1,   1,   1,   1,
    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
    1,   1,  32,  33,  34,  35,  36,  37,  38,  39,
   40,  41,  42,  43,  44,  45,  46,  47,  48,  49,
   50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
   60,  61,  62,  63,  64,  65,  66,  67,  68,  69,
   70,  71,  72,  73,  74,  75,  76,  77,  78,  79,
   80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
   90,  91,  92,  93,  94,  95,  96,  97,  98,  99,
  100, 101, 102, 103, 104, 105, 106, 107, 108, 109,
  110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
  120, 121, 122, 123, 124, 125, 126,   1,  67, 117,
  101,  97,  97,  97,  97,  99, 101, 101, 101, 105,
  105, 105,  65,  65,  69,  97,  65, 111, 111, 111,
  117, 117, 121,  79,  85,  99,  76,  89,  80, 102,
   97, 127, 111, 117, 110,  78,  97, 111,  63,   1,
    1,   1,   1,  33,  60,  62, 111,  43,  50,  51,
   39, 117,  80,  46,  44,  49,  48,  62,   1,   1,
    1, 128, 129, 130, 131, 132, 133, 134, 135, 136,
  137, 138, 139, 140, 141, 142, 143, 144, 145, 146,
  147, 148, 149, 150, 151,  88, 152, 153, 154, 155,
  156, 157, 158, 160, 129, 130, 131, 132, 133, 134,
  135, 136, 137, 138, 139, 140, 141, 142, 143, 144,
  145, 146, 147, 148, 149, 150, 151,  47, 152, 153,
  154, 155, 156, 157, 159, 158
};

And here are some snapshots I've taken to showcase the new characters (the repeated ones are just the "lowercase" versions being mapped to "uppercase" versions, as all characters are in that font for the game)

The .pcx fonts didn't help much for this smaltext.cel since the smallest .pcx font was still considerably bigger than the size I needed. I think they will be more useful for the next fonts. For now I had to rely a lot on my pixel art skills xD (those childhood days of RPG Maker have finally been put to some use).

With these new characters the game will support most of western languages. And the process I've followed can be replicated by anyone who wants to add support for eastern or other languages that could be expressed by 256 symbols. Anything bigger than that would be another problem entirely, but not very hard.

It's important to note that you'll have to change the encoding of whatever file in which you use those characters to ISO-8859-1 (Sublime Text is good for that). There are other ways to make it work but that's just the process I've been using.

I'm considering changing the code so that all strings are read from a .tbl file (like in Diablo 2), making it waaay easier for other translators. Also, the encoding changes would only need to happen on that file, not on the source code. I'm a bit noob in GitHub, so what do you guys think would be the most adequate way of creating this "translation-optimized devilution"? As a new project entirely?

@maristane really cool you made it work! Haha, love the pixel art skills. I'm sure this will be very useful to translate Diablo to further languages.

As for whether to do this as a dedicated translation-optimized devilution as a new project, or incorporate this into Devilution somehow, I'll leave up to @galaxyhaxz.

Personally, I'd wish to keep Devilution as close to the original as possible (even make it byte-identical one day #11), but I also understand the community's desire to add on, improve and extend it from day one. We have to find a good approach for this. The issue tracking this decision for forks vs. branches etc for community extensions and work on Devilution is #39 as far as I can tell.

Oh, I see @mewmew , I'll wait for further definitions then :3

Well, I'm having an unexpected problem with medtexts.cel now.

I've extended it with some new characters, inserted data\medtexts.cel into _patchrt.mpq and updated the mfontframe and mfontkern accordingly. But with or without the updated mapping the special characters are still somehow "simplified" (not exactly, though). For instance, "ÀÁÂÃ" becomes "UEAA" in-game.

I've changed InitQuestText in minitext.cpp to read from a "MedTextS2.CEL" and renamed the one I created to match and check if the game was really reading it and not the original one, and indeed it read my new one, apparently without problems.

So I think this "simplifying" must be happening somewhere along the way. Maybe in DrawQText or PrintQTExtChr.

What do you guys think?

It would also be good if you document the new code and changes, and keep them backed up in a file. That way you can stay in sync with the latest fixes from the master, and continue to document your work on multi-lingual character sets. That way when Devilution is fully stable, it can be integrated into future mods.

It would also be good if you document the new code and changes, and keep them backed up in a file. That way you can stay in sync with the latest fixes from the master, and continue to document your work on multi-lingual character sets. That way when Devilution is fully stable, it can be integrated into future mods.

Perfect, @galaxyhaxz ! I'll do that!

Speaking of changes, I had to do a very specific one in order to get medtexts.cel to work properly. I'll explain it here for the record, in case anyone have troubles with that font in the future.

Basically, DrawQText was doing a very strange thing: it was mapping characters through fontidx, as usual... and then mapping again through fontidx. i.e. position 192 in fontidx contains 129, DrawQText gets that but then gets the value in position 129 of fontidx (in my case, 67) and uses that to determine the frame. Oddly enough, that works perfecly for characters above 32 and below 127 (all regular numbers and letters are contained within this interval) on the UTF-8 table, since their value in fontidx is equal to their index on the array. This seems like a quick - although very imprecise/"random" - way to map strange characters to basic, "below-127" ones. But, well, I had to bypass that to get my new characters.

The simplest and safest way I found to create a workaround for this was changing this line: v10 = mfontframe[fontidx[v8]]; To this: v10 = mfontframe[v8];

Works like a charm! And I've already finished creating the new medtexts.cel with the new characters. But I'll document and explaing everything better when I finish the last font, bigtgold.cel. And I'll come for help if something weird happens with that font too, hahah. When I finish bigtgold I guess we'll be able to close this issue.

It's done!

Below I present to you how each font renders the characters ¡¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ

smalltext

medtexts

bigtgold

And now some "real-world" examples for each one, from my brazilian portuguese translation in the making:

smalltext

medtexts

bigtgold

I'll document here:

1. The process for "installing" the new characters, useful for people wanting to translate the game to western languages or to create localized mods. 2. The process I used to update the font files, for anyone who could have similar needs in the future (like someone wanting to replace the game fonts entirely, or to add support for other characters).

Let's get to it!

Installing the new characters

Download the updated .cel files: diablo1-utf8-fonts.zip
Add them to _Patchrt.mpq using WinMPQ (using newer MPQ editors for this can cause the game to crash when loading the assets). smalltext.cel goes into a ctrlpan\ folder, and the other two into a data\ folder (WinMPQ will prompt you for these folders when you insert each file).
Change the overall character indexing and the font mapping for smalltext.cel in control.cpp (remember to update control.h accordingly) to these:

unsigned char fontframe[161] =
{
    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
    0,   0,   0,  54,  44,  57,  58,  56,  55,  47,
   40,  41,  59,  39,  50,  37,  51,  52,  36,  27,
   28,  29,  30,  31,  32,  33,  34,  35,  48,  49,
   60,  38,  61,  53,  62,   1,   2,   3,   4,   5,
    6,   7,   8,   9,  10,  11,  12,  13,  14,  15,
   16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
   26,  42,  63,  43,  64,  65,   0,   1,   2,   3,
    4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
   14,  15,  16,  17,  18,  19,  20,  21,  22,  23,
   24,  25,  26,  40,  66,  41,  67,  68,  69,  70, 
   71,  72,  73,  74,  75,  76,  77,  78,  79,  80,
   81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
   91,  92,  93,  94,  95,  96,  97,  98,  99, 100,
   101
};
unsigned char fontkern[102] =
{
    8,  10,   7,   9,   8,   7,   6,   8,   8,   3,
    3,   8,   6,  11,   9,  10,   6,   9,   9,   6,
    9,  11,  10,  13,  10,  11,   7,   5,   7,   7,
    8,   7,   7,   7,   7,   7,  10,   4,   5,   6,
    3,   3,   4,   3,   6,   6,   3,   3,   3,   3,
    3,   2,   7,   6,   3,  10,  10,   6,   6,   7,
    4,   4,   9,   6,   6,  12,   3,   7,   3,   6,
    10,  10, 10,  10,  10,  10,  10,   9,   7,   7,
    7,   7,   3,   3,   4,   4,   8,   9,  10,  10,
    10, 10,  10,  10,  11,  11,  11,  11,  10,  10,
    7,   8
};

unsigned char fontidx[256] =
{
    0,   1,   1,   1,   1,   1,   1,   1,   1,   1,
    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
    1,   1,  32,  33,  34,  35,  36,  37,  38,  39,
   40,  41,  42,  43,  44,  45,  46,  47,  48,  49,
   50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
   60,  61,  62,  63,  64,  65,  66,  67,  68,  69,
   70,  71,  72,  73,  74,  75,  76,  77,  78,  79,
   80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
   90,  91,  92,  93,  94,  95,  96,  97,  98,  99,
  100, 101, 102, 103, 104, 105, 106, 107, 108, 109,
  110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
  120, 121, 122, 123, 124, 125, 126,   1,  67, 117,
  101,  97,  97,  97,  97,  99, 101, 101, 101, 105,
  105, 105,  65,  65,  69,  97,  65, 111, 111, 111,
  117, 117, 121,  79,  85,  99,  76,  89,  80, 102,
   97, 127, 111, 117, 110,  78,  97, 111,  63,   1,
    1,   1,   1,  33,  60,  62, 111,  43,  50,  51,
   39, 117,  80,  46,  44,  49,  48,  62,   1,   1,
    1, 128, 129, 130, 131, 132, 133, 134, 135, 136,
  137, 138, 139, 140, 141, 142, 143, 144, 145, 146,
  147, 148, 149, 150, 151,  88, 152, 153, 154, 155,
  156, 157, 159, 160, 129, 130, 131, 132, 133, 134,
  135, 136, 137, 138, 139, 140, 141, 142, 143, 144,
  145, 146, 147, 148, 149, 150, 151,  47, 152, 153,
  154, 155, 156, 157, 159, 158
};

Change the font mapping for medtexts.cel in minitext.cpp (remember to update minitext.h accordingly) to this:

unsigned char mfontframe[161] =
{
0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
0,   0,   0,  37,  49,  38,   0,  39,  40,  47,
42,  43,  41,  45,  52,  44,  53,  55,  36,  27,
28,  29,  30,  31,  32,  33,  34,  35,  51,  50,
48,  46,  49,  54,   0,   1,   2,   3,   4,   5,
6,   7,   8,   9,  10,  11,  12,  13,  14,  15,
16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
26,  42,   0,  43,   0,   0,   0,   1,   2,   3,
4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
14,  15,  16,  17,  18,  19,  20,  21,  22,  23,
24,  25,  26,  48,   0,  49,   0,  56,  57,  58,
59,  60,  61,  62,  63,  64,  65,  66,  67,  68,
69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
79,  80,  81,  82,  83,  84,  85,  86,  87,  88,
89
};
unsigned char mfontkern[90] =
{
5,  15,  10,  13,  14,  10,   9,  13,  11,   5,
5,  11,  10,  16,  13,  16,  10,  15,  12,  10,
14,  17,  17,  22,  17,  16,  11,   5,  11,  11,
11,  10,  11,  11,  11,  11,  15,   5,  10,  18,
15,   8,   6,   6,   7,  10,   9,   6,  10,  10,
5,   5,   5,   5,  11,  12,   5,  11,  15,  15,
15,  15,  15,  15,  16,  13,  10,  10,  10,  10,
6,    5,   6,   6,  14,  13,  16,  16,  16,  16,
16,  16,  17,  17,  17,  17,  16,  16,  10,  10
};

There is some extra work for medtexts.cel. You'll have to change this line of code to this: v10 = mfontframe[v8];

Change the font mapping for bigtgold.cel in gmenu.cpp (remember to update gmenu.h accordingly) to this:

unsigned char lfontframe[161] =
{
0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
0,   0,   0,  37,  49,  38,   0,  39,  40,  47,
42,  43,  41,  45,  52,  44,  53,  55,  36,  27,
28,  29,  30,  31,  32,  33,  34,  35,  51,  50,
0,  46,   0,  54,   0,   1,   2,   3,   4,   5,
6,   7,   8,   9,  10,  11,  12,  13,  14,  15,
16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
26,  42,   0,  43,   0,   0,   0,   1,   2,   3,
4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
14,  15,  16,  17,  18,  19,  20,  21,  22,  23,
24,  25,  26,  20,   0,  21,   0,  56,  57,  58,
59,  60,  61,  62,  63,  64,  65,  66,  67,  68,
69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
79,  80,  81,  82,  83,  84,  85,  86,  87,  88,
89
};
unsigned char lfontkern[90] =
{
18,  33,  21,  26,  28,  19,  19,  26,  25,  11,
12,  25,  19,  34,  28,  32,  20,  32,  28,  20,
28,  36,  35,  46,  33,  33,  24,  11,  23,  22,
22,  21,  22,  21,  21,  21,  32,  10,  20,  36,
31,  17,  13,  12,  13,  18,  16,  11,  20,  21,
11,  10,  12,  11,  21,  23,  10,  21,  33,  33,
33,  33,  33,  33,  34,  26,  19,  19,  19,  19,
14,  11,  12,  12,  28,  28,  32,  32,  32,  32,
32,  32,  36,  36,  36,  36,  33,  33,  20,  21
};

And there you go, the game is now ready to support some weird western characters. Just remember that for every source file in which you use some of those new characters, you'll have to change the file's encoding to ISO-8859-1 or similar (Windows 1252, for instance, from what I can recall). Sublime Text's "Save with encoding..." works perfectly for that. Notepad++ has some good options for this as well. "Why not change to UTF-8?" you may ask. I have no idea. Even though what I did was to add UTF-8-supported characters, changing the source files encoding to UTF-8 (with or without BOM) just didn't work for me, leads to greatly misinterpreted characters.

My workflow for updating the fonts

First I extract+convert the .cel file frames to .bmp files using TheDarkGraphics . Refer to the program's help file for more details, but basically I uncheck "header", open the font's .cel file, check "Loop" then click to "Save BMP". The .bmp files are then created on the same folder as the .cel file.
I then manually create copies of the .bmp files with closer correspondence to the characters I want to create. For instance, I copy the file for the letter "A" as many times as there are new characters with that letter as a base (À, Á, Ã, etc). Creating new .bmps can lead to wrong colors and even game crashes, it's better to edit over copies of "original" .bmps.
For the editing, I used the oldest MS Paint I could find. All other programs I've tried (newer MS Paint and Photoshop) led to problems with the resulting .cel file. I didn't try Gimp though. I only used Photoshop for changing the .pcx font images to the closest I could get in color and size to the font I was updating, so I could copy and paste parts of them to Paint Then I do some almost "pixel by pixel" color matching with the pencil and bucket tools so the new parts only have colors already present on the original character.
I then convert all .bmps to .cel files using again TheDarkGraphics. Uncheck "header", "open bmp" and open the first file from the sequence. Then check "loop" and "save cel". The .cel files are then created in the same folder as the .bmps are.
I use Cel Maker to decompile the original .cel file into separate .cel frames by going to Animation>Decompile CEL Animation. When doing this, I check "Include Frame Header".
There are many ways to do this, but, in short, I then compile a new CEL animation in Cel Maker (Animation>New Cel Animation) by adding the original frames, decompiled on the previous step, and then I add the new .cel frames generated in TheDarkGraphics for the new characters. In this process I find it useful to rename the new .cel frames to match the sequence and format of the previous frames names, but I don't think it's necessary. Before hitting compile, I check "has header" and then click on "remove frame headers". Forgetting to do this will lead to "corrupted" files in this specific workflow.
And it's done for the .cel file, I can just add the resulting .cel "animation" to Patch._rt.mpq using WinMPQ and it will work fine.
Will it? Not yet, I have to update the font mappings in the source code for them to work. fontidx was updated just once, I basically checked this UTF-8 table and compared with the array to understand witch character was what, and to what position in fontframe/mfontframe/lfontframe it maps and change it to my needs. Updating fontframe/mfontframe/lfontframe is easy too, just a matter of ponting to the corresponding .cel frame for each position. And then there is fontkern/mfontkern/lfontkern, in which, for the new chars, I basically copy the widths from the "most similar" original characters, with some adjustments when necessary (some accented versions of the letter "i", for instance, required a slightly larger width than that of the original "i").

Notes:

Never save the .bmps with the width changed. You can change during editing, but save with the original width (13px for smalltext, 22px for medtexts and 46px for bigtgold) or else the game will crash when reading the .cel file. Unless, of course, you have changed the source code to support a different width (very risky).
The height can be changed at will (or at least I haven't hit a limit with what I did), but here is how it works: for every pixel you increase in height, the image will go "up" 1px in-game (as if the the frames are "aligned at bottom", if that makes sense to you). If you then move the character image 1px down but keeping the new height, it will appear in-game at the same position as if you haven't changed anything. Because you moved it "up" by increasing the height, but then down by directly moving it down. And that's how I got all the new characters properly aligned vertically. If I had to add a 5px accent above some letter, all I had to do was increase the file height in 5px, then move the letter down 5px (so it would be at the same "position" as the regular letter), and then place/draw the accent above it.

That's it guys, once again thank you very much for all the help! I'll keep on working on my portuguese translation for now, but in time I may start that "translation-optimized-diablo1" idea, but won't be my main focus for a while. Cheers!

Thanks for documenting this @maristane! Very happy to see that it worked out so well. Happy translation :)

diasurgical / devilution

Support for accented characters #32

ASCII character code -> font index

smaltext.cel

Font index -> frame number

Font index -> frame width

medtexts.cel

Font index -> frame number

Font index -> frame width

bigtgold.cel

Font index -> frame number

Font index -> frame width

Installing the new characters

My workflow for updating the fonts