Closed Meerkov closed 2 years ago
Worth noting that the API actually returns alternate line endings other than just LINE_BREAK, including EOL_SURE_SPACE which indicates that the line continues on the next line, and should be treated as a space rather than a new line.
Currently the code implicitely treats all linebreaks as \n
. When I tested treating EOL_SURE_SPACE as a space instead, it seems to sometimes result in better translations, and othertimes it seems to cause words to disappear from the translation entirely. This seems like it might be a quirk of the translation API.
Here is my experimental code. You can play with changing what each type of linebreak does.
bool GameLogicComponent::ReadFromParagraph(const cJSON *paragraph, TextArea &textArea)
{
//get words
const cJSON *words = cJSON_GetObjectItemCaseSensitive(paragraph, "words");
const cJSON *word;
string finalText;
CL_Vec2f lastVerts[4];
for (int i = 0; i < 4; i++)
{
lastVerts[i] = CL_Vec2f(0, 0);
}
string lineText;
string finalTextRaw;
CL_Rect rectOfLastLine;
bool bRectSet = false;
int wordsProcessed = 0;
float isDialogFuzzyLogic = 0;
CL_Rect totalRect;
vector<WordInfo> wordInfo;
cJSON_ArrayForEach(word, words)
{
// LogMsg("Got a word");
const cJSON *symbols = cJSON_GetObjectItemCaseSensitive(word, "symbols");
const cJSON *symbol;
//get the bounding box of this word
const cJSON *boundingBox = cJSON_GetObjectItemCaseSensitive(word, "boundingBox");
const cJSON *vertices = cJSON_GetObjectItemCaseSensitive(boundingBox, "vertices");
const cJSON *vert;
CL_Vec2f verts[4];
int vertCount = 0;
cJSON_ArrayForEach(vert, vertices)
{
float x, y;
cJSON *tempObj = cJSON_GetObjectItem(vert, "x");
if (tempObj)
{
x = tempObj->valuedouble;
}
else
{
x = 0;
}
tempObj = cJSON_GetObjectItem(vert, "y");
if (tempObj)
{
y = tempObj->valuedouble;
}
else
{
y = 0;
}
verts[vertCount] = CL_Vec2f(x, y);
assert(verts[vertCount] >= 0);
vertCount++;
}
if (!bRectSet)
{
//LogMsg("Setting rect for first word");
if (wordsProcessed == 0)
{
rectOfLastLine = CL_Rectf(verts[0].x, verts[0].y, verts[2].x, verts[2].y);
totalRect = rectOfLastLine;
}
else
{
rectOfLastLine = CL_Rectf(verts[0].x, verts[0].y, verts[2].x, verts[2].y);
totalRect.bounding_rect(rectOfLastLine);
}
bRectSet = true;
}
else
{
CL_Rect newWord = CL_Rectf(verts[0].x, verts[0].y, verts[2].x, verts[2].y);
rectOfLastLine.bounding_rect(newWord);
totalRect.bounding_rect(newWord);
}
for (int i = 0; i < 4; i++)
{
lastVerts[i] = verts[i];
}
if (!lineText.empty())
{
//lineText += " ";
}
cJSON_ArrayForEach(symbol, symbols)
{
const cJSON *text = cJSON_GetObjectItemCaseSensitive(symbol, "text");
lineText += text->valuestring;
//what about the exact rect of this text?
const cJSON *boundingBox2 = cJSON_GetObjectItemCaseSensitive(symbol, "boundingBox");
vertices = cJSON_GetObjectItemCaseSensitive(boundingBox2, "vertices");
vertCount = 0;
cJSON_ArrayForEach(vert, vertices)
{
float x, y;
cJSON *tempObj = cJSON_GetObjectItem(vert, "x");
if (tempObj)
{
x = tempObj->valuedouble;
}
else
{
x = 0;
}
tempObj = cJSON_GetObjectItem(vert, "y");
if (tempObj)
{
y = tempObj->valuedouble;
}
else
{
y = 0;
}
//assert(x >= 0 && y >= 0);
verts[vertCount] = CL_Vec2f(x, y);
vertCount++;
}
if (vertCount == 4)
{
WordInfo w;
w.m_rect = CL_Rectf(verts[0].x, verts[0].y, verts[2].x, verts[2].y);
w.m_word = text->valuestring;
wordInfo.push_back(w);
}
const cJSON* symbolProperty = cJSON_GetObjectItemCaseSensitive(symbol, "property");
const cJSON* linebreak = cJSON_GetObjectItemCaseSensitive(symbolProperty, "detectedBreak");
if (linebreak != NULL) {
const cJSON* detectedBreak = cJSON_GetObjectItemCaseSensitive(linebreak, "type");
string space("SPACE"), eolspace("EOL_SURE_SPACE");
if (space.compare(detectedBreak->valuestring)==0) {
lineText += " ";
}
else {
if (eolspace.compare(detectedBreak->valuestring)==0) {
lineText += "\n"; // Seems better if EOL space is treated as a newline? (old behavior)
}
else {
lineText += "\n";
}
LineInfo lineInfo;
lineInfo.m_lineRect = rectOfLastLine;
lineInfo.m_words = wordInfo; wordInfo.clear();
lineInfo.m_text = lineText;
textArea.m_lines.push_back(lineInfo);
textArea.m_lineStarts.push_back(rectOfLastLine.get_top_left());
finalText += lineText;
finalTextRaw += lineText;
lineText = "";
bRectSet = false;
}
}
}
wordsProcessed++;
}
textArea.text += finalText;
textArea.rawText += finalTextRaw;
utf8::utf8to16(finalTextRaw.begin(), finalTextRaw.end(), back_inserter(textArea.wideText));
textArea.m_rect = totalRect;
// The original is the same from here on...
^ This is treating EOL_SURE_SPACE as a linebreak (existing behavior)
Below, this is treating it as a space for the purpose of Dialog translation (i.e. I only changed "\n" to " " in the above code)
As you can see, for some reason, the words after the space appear to get dropped. This might be because something is being encoded incorrectly in my simple experiment such as perhaps I need to be using a wide-char string? I'm not exactly sure, so I invite some experimentation with that.
Image you can use to experiment with
Ok, I put this in ( 31badf5 ) and did a bit of tweaking, I think it's working, at least I don't really see a difference in the output when comparing results to the old version so I guess that's good.
BTW, the weird thing about the translation issue in your pics is it WAS sending it correctly, Google just decided to translate it shorter when it through it was a single sentence I guess. DeepL translated both (with or without the linebreak change) the same.
I'm going to be going through the other issues and test things for a few days before doing a real release. Appreciate the help and input btw, kind of gets me off my ass to work on things instead of just play games. :)
Thanks, Glad I could be help 😃
So, as it turns out, I don't think this is necessary. You can instead check for a
property
calleddetectedBreak
on the symbol. That will let you know if that symbol is the end of the line.I think fixing this may make translation more reliable when spanning multiple lines.