Misalignement between English and Japanese files

Hi Ken,

I am currently trying to implement the editor for your translation, but I have a misalignment between the japanese and english rvtext files. Looking in the database files, I have for instance this in the Japanese version:

<<items/545/name>>
***ダミー

<<items/601/description>>
【転職用】所持していれば「商人」に転職できる

while I have an additional entry for the English version:

<<items/545/name>>
***ダミー

<<items/600/name>>
***ダミー

<<items/601/description>>
[Job] Allows holder to become a merchant.

It is not just in the wrong place, it is really absent from the Japanese version. And it is not the only case: in the Japanese version I have 19848 entries, while in the English version I have 19859 entries (11 more in English). I did not check if the English version is the only one to add entries. It may be that the Japanese version also has entries which are not in the English version, so more than 11 differences may occur. It depends on how you generate your files.

So here are my questions:

Should the Japanese and English versions have strictly the same entries?

If yes, do you plan to fix it? Otherwise the answer should be no. Maybe I could help you if you want to fix it but you are not sure how.

If no, how should I treat the extra entries? If the Japanese version is the reference one, I may simply ignore any additional entry coming from the English version. But we should be sure that the English version is the only one adding extra entries.

Are the entries supposed to be strictly in the same order? It seems they are (could not check fully because of the misalignement) but if you confirm it then I can take it as an assumption to have nice performances.

The Japanese is a fresh dump of the latest version where the English one has a few update cycles to bring it up to standard.

Lets see "***ダミー" are dummy entries used as placeholders, The main developer adds and removes these every so often but those entries can be ignored to save time, nothing actually is useful about them but the system pulls them regardless.

<<items/601/description>> I don't see anything wrong with that one, looks to match up

1) Ideally yes the Japanese and English should have matched entries 11 entries only in the English and 0 entries only in the Japanese for the Database, I'll clean this up in a bit thanks for bringing it up

1 Yes) I do plan to fix it but its one of those bugs that are more an annoyance, one day all the data will look nice

1 No) For now don't worry about any entries with "**ダミー" or "*ダミー", as they don't contain real data; Also any entries that are only in the English version can be completely ignored as it is a remnant of upgrading, any thing that is only in the Japanese should be addresses as that was a missed entry.

2) With Sorting I do sort the data, it is very hard to compare unsorted... Fun times... But as the Japanese gets generated first the English could be a bit more sorted, if I notice they are misaligned I'll adjust them to match again.

Thank you, now I got to delete some entries; Though I haven't looked if that also happens with Dialogue or ScriptText

On Sun, Aug 23, 2015 at 10:36 AM, Sazaju HITOKAGE notifications@github.com wrote:

Hi Ken,

I am currently trying to implement the editor for your translation, but I have a misalignment between the japanese and english rvtext files. Looking in the database files, I have for instance this in the Japanese version:

<<items/545/name>> ***ダミー

<<items/601/description>> 【転職用】所持していれば「商人」に転職できる

while I have an additional entry for the English version:

<<items/545/name>> ***ダミー

<<items/600/name>> ***ダミー

<<items/601/description>> [Job] Allows holder to become a merchant.

It is not just in the wrong place, it is really absent from the Japanese version. And it is not the only case: in the Japanese version I have 19848 entries, while in the English version I have 19859 entries (11 more in English). I did not check if the English version is the only one to add entries. It may be that the Japanese version also has entries which are not in the English version, so more than 11 differences may occur. It depends on how you generate your files.

So here are my questions:

Should the Japanese and English versions have strictly the same entries?

If yes, do you plan to fix it? Otherwise the answer should be no. Maybe I could help you if you want to fix it but you are not sure how.

If no, how should I treat the extra entries? If the Japanese version is the reference one, I may simply ignore any additional entry coming from the English version. But we should be sure that the English version is the only one adding extra entries.

Are the entries supposed to be strictly in the same order? It seems they are (could not check fully because of the misalignement) but if you confirm it then I can take it as an assumption to have nice performances.

— Reply to this email directly or view it on GitHub https://github.com/MGQ-EX/Paradox/issues/3.

Lets see ***ダミー are dummy entries used as placeholders, The main developer adds and removes these every so often but those entries can be ignored to save time, nothing actually is useful about them but the system pulls them regardless.

That one I could guess (ダミー = dummy) and my point was more about "should I expect other surprises like that with actual data". I may filter them out, but it means I will need more computation time. Too bad.

<<items/601/description>> I don't see anything wrong with that one, looks to match up

Not with this one, the one in between <<items/600/name>>. I gave the full context (1 before + 1 after) to show the concrete difference.

1) Ideally yes the Japanese and English should have matched entries 11 entries only in the English and 0 entries only in the Japanese for the Database, I'll clean this up in a bit thanks for bringing it up

Do you think that you can make it such that it will not happen anymore? The point is not to fix the difference, but to fix what causes it, otherwise I may assume that it is good and then my editor may mess up stuff because the difference comes back. Can you make such that it should not occur anymore?

1 Yes) I do plan to fix it but its one of those bugs that are more an annoyance, one day all the data will look nice

For me it is more than an annoyance, because if the data is clean, parsing and computing goes really fast, otherwise I need to revise the whole data parsed to clean it, so it may have a significant impact because you have really big files (20k entries is a lot). This is why I prefer to help you doing it rather than counting on manual cleaning or waiting for "one day all the data will look nice". The advantage of an extractor is that you run it once and you have the files, so you may wait some minutes without a problem because then you can forget the extractor until the next version. An editor is something you open every time you want to translate, so having to wait even 20s every time you run it is really frustrating.

1 No) For now don't worry about any entries with ***ダミー or **ダミー, as they don't contain real data; Also any entries that are only in the English version can be completely ignored as it is a remnant of upgrading, any thing that is only in the Japanese should be addresses as that was a missed entry.

OK, so I take the Japanese version as the reference and I ignore dummy entries. If they have the same order, it should remain fast.

2) With Sorting I do sort the data, it is very hard to compare unsorted... Fun times... But as the Japanese gets generated first the English could be a bit more sorted, if I notice they are misaligned I'll adjust them to match again.

Here too, the point is not to do it manually, but to do such that you will not need to check, it will be ensured by the program.

Thank you, now I got to delete some entries; Though I haven't looked if that also happens with Dialogue or ScriptText

Now I can continue mine, so maybe I will notice you if I find some. But as much as possible, the point is to make all that error-prone stuff via the machine, not manually, otherwise I cannot rely on it and I have to check everything every time I parse to avoid issues if the problem comes back. So what is your plan, because so far your answers are interpretable in both ways: just fix the current problems, or fix their causes?

Probably wouldn't be possible to completely remove Human error from the project unless a more automated method was made to sync the Japanese to the English version on every new release. Unfortunately an error rate will exist for as long as the project is on going though if only a few entries in thousands are erroneous that is actually pretty good however it will fall short of perfection. Some efforts are taken to automate that it is still a heavily manual task, the last update shifted a lot of map name, removed a lot of those dummy entries, and added new content which is still being sorted through. Now just removing any entry that didn't match with the Japanese would be faster it also increases the likelihood of work lost and there is a lot of data. As for when the data has to be heavily scrubbed would be on any official version change.

As for the current plan is to keep going through the data as that is the method currently available, there really isn't much of an alternative at the moment... Both extraction methods don't seem to handle that upgrade process well, the current tools involve RPG Maker along with Notepad++ and WinMerge however neither handle comparison that great, Language File System doesn't really handle sorting the data I had to add that and those algorithms could use improvements, and the efforts of a small team (and that is including you else it's a 2 man party at the moment).

Good luck on your end ♪

On Sun, Aug 23, 2015 at 8:27 PM, Sazaju HITOKAGE notifications@github.com wrote:

Lets see ***ダミー are dummy entries used as placeholders, The main developer adds and removes these every so often but those entries can be ignored to save time, nothing actually is useful about them but the system pulls them regardless.

That one I could guess (ダミー = dummy) and my point was more about "should I expect other surprises like that with actual data". I may filter them out, but it means I will need more computation time. Too bad.

<<items/601/description>> I don't see anything wrong with that one, looks to match up

Not with this one, the one in between <<items/600/name>>. I gave the full context (1 before + 1 after) to show the concrete difference.

1) Ideally yes the Japanese and English should have matched entries 11 entries only in the English and 0 entries only in the Japanese for the Database, I'll clean this up in a bit thanks for bringing it up

Do you think that you can make it such that it will not happen anymore? The point is not to fix the difference, but to fix what causes it, otherwise I may assume that it is good and then my editor may mess up stuff because the difference comes back. Can you make such that it should not occur anymore?

1 Yes) I do plan to fix it but its one of those bugs that are more an annoyance, one day all the data will look nice

For me it is more than an annoyance, because if the data is clean, parsing and computing goes really fast, otherwise I need to revise the whole data parsed to clean it, so it may have a significant impact because you have really big files (20k entries is a lot). This is why I prefer to help you doing it rather than counting on manual cleaning or waiting for "one day all the data will look nice". The advantage of an extractor is that you run it once and you have the files, so you may wait some minutes without a problem because then you can forget the extractor until the next version. An editor is something you open every time you want to translate, so having to wait even 20s every time you run it is really frustrating.

1 No) For now don't worry about any entries with *_ダミー or _ダミー, as they don't contain real data; Also any entries that are only in the English version can be completely ignored as it is a remnant of upgrading, any thing that is only in the Japanese should be addresses as that was a missed entry.

OK, so I take the Japanese version as the reference and I ignore dummy entries. If they have the same order, it should remain fast.

2) With Sorting I do sort the data, it is very hard to compare unsorted... Fun times... But as the Japanese gets generated first the English could be a bit more sorted, if I notice they are misaligned I'll adjust them to match again.

Here too, the point is not to do it manually, but to do such that you will not need to check, it will be ensured by the program.

Thank you, now I got to delete some entries; Though I haven't looked if that also happens with Dialogue or ScriptText

Now I can continue mine, so maybe I will notice you if I find some. But as much as possible, the point is to make all that error-prone stuff via the machine, not manually, otherwise I cannot rely on it and I have to check everything every time I parse to avoid issues if the problem comes back. So what is your plan, because so far your answers are interpretable in both ways: just fix the current problems, or fix their causes?

— Reply to this email directly or view it on GitHub https://github.com/MGQ-EX/Paradox/issues/3#issuecomment-133972520.

May I provide you a program to do it? From the point where you have extracted your files, you may apply it to do all the cleaning, so you can ensure that the data is clean before to push it to your repo. As far as I understood, all of that is doable in a programmatic way (there is no ambiguity). So basically, the thing I would do every time I run the editor, I put it in a separate program that you apply on your own side, so it is made once for all and you and I can profit of it. If you cannot modify the tools you are using, it would be way faster for me to do it, as I have already all the parsing facilities and structures to do it easily.

If you provide a program to handle that process would be worthwhile, RPG Maker isn't great at mass data processing and when I tell it to start I go to sleep and when I wake it probably is still going

I opened the issue #6 to speak more in details about that. So we can keep this issue for the original problem. Please don't close it before it is actually solved.

MGQ-EX / Paradox

Misalignement between English and Japanese files #3