IridiumIO / CompactGUI

Transparently compress active games and programs using Windows 10/11 APIs
GNU General Public License v3.0
5.07k stars 235 forks source link

Doesn't work properly on Non-English systems #40

Closed Cheet4h closed 7 years ago

Cheet4h commented 7 years ago

Hey,

I just ran the tool on a folder and got this. Doesn't do anything for about an hour now.

I think I found the problem: In the file "Compact.vb" the process "MyProcess_OuputDataReceived" checks for english output lines. This obviously fails when compact.exe returns german lines, which may pose an issue on other languages, too. Not sure how localized most of them are.

The proper translations seem to be:

"total bytes of data are stored in" -> "Datenbytes insgesamt werden in %i Bytes gespeichert"  

"The compression ratio is" -> "Das Komprimierungsverh„ltnis ist" #note: "„" should be "ä"  

"directories were uncompressed" -> "waren nicht komprimiert" #probably a bad translation on MS' part here, if this is supposed to indicate the number of files that have just been uncompressed. German means "files that were not compressed"  

" Compressing files in" -> "Komprimieren der Dateien in"  

"[OK]" -> "[OK]" #no change here  

" are not compressed." -> "nicht komprimiert." #this will also trigger on the entry for "directories were uncompressed". Maybe move that entry into a nested if-condition here?  

" Listing " ->" Auflisten von "  

I'll try to fix this on my system and create a branch if I get it working. I haven't looked through the whole source to see if there's more checks for localized output. Also, I never worked with VB before >_>

To ultimately fix this and make it available on more languages, the string should probably be put into a database (XML or JSON?) and be loaded upon start of the program.
Getting the System Locale should be possible, although I've only done this in C# before and don't know how to do this in VB. Otherwise you can probably hack it together by checking the copyright notice in the output.

Makrea commented 7 years ago

You seem on the right track.

Works well on PT-PT localization.

Cheet4h commented 7 years ago

Okay, creating a branch doesn't work for me, probably my own fault. Not all too accustomed with GitHub.

I managed to seemingly get it to work with german language by updating the function with german strings. And nesting a condition so it doesn't trigger too often. It probably still needs another nesting since it triggers all the time during analysis, but I'll check that out tomorrow.

What doesn't work and what I currently don't see how to fix is the compressed filecount. Probably checks for things that aren't present in the german output.

Anyway, here's how I changed the function:

Click to Expand ```vb Private Sub MyProcess_OutputDataReceived _ (ByVal sender As Object, ByVal e As System.Diagnostics.DataReceivedEventArgs) _ Handles MyProcess.OutputDataReceived AppendOutputText(vbCrLf & e.Data) 'Sends output to the embedded console Try If e.Data.Contains("Datenbytes insgesamt werden in") Then 'Gets the output line that contains both the pre- and post-compression folder sizes byteComparisonRaw = e.Data End If If e.Data.Contains("Das Komprimierungsverh") Then 'Gets the output line that contains the compression ratio and forces the progress bar to 100% (indirectly due to threading) compressFinished = 1 dirCountProgress = dirCountTotal fileCountProgress = fileCountTotal End If If e.Data.StartsWith(" Komprimieren der Dateien in") Then 'Gets each directory that is compressed. Used for the old progressbar. dirCountProgress += 1 End If If e.Data.EndsWith("[OK]") Then 'Gets each file that was successfully compressed OR uncompressed. fileCountProgress += 1 End If If e.Data.EndsWith(" nicht komprimiert.") Then 'Gets the output line that identifies the total number of files compressed. If e.Data.Contains("waren nicht komprimiert") Then 'Gets the output line that identifies that an uncompression event has finished. dirCountProgress = 0 fileCountProgress = fileCountTotal uncompressFinished = 1 Else byteComparisonRawFilesCompressed = e.Data End If End If If e.Data.StartsWith(" Auflisten von ") Then 'Gets the output line that identifies the query folder count QdirCountProgress += 1 End If Catch ex As Exception End Try End Sub ```
vient commented 7 years ago

Compact.exe uses FormatMessage API with FORMAT_MESSAGE_FROM_HMODULE parameter to output messages. From what I've read on MSDN, it should be possible to resolve all messages in the beginning (run compact.exe, obtain its handle, then use FormatMessage(FORMAT_MESSAGE_FROM_HMODULE, %compact_exe_handle%, %message_id%, 0, ...) and use them instead of hardcoded strings.

theChaosCoder commented 7 years ago

You can force EN language output with:

 [Threading.Thread]::CurrentThread.CurrentUICulture = 'en';  compact.exe ...
Iridium-IO commented 7 years ago

Thanks for all the input! For now I might resolve to use what @theChaosCoder suggested and force the console output strings into english. Since most of the output is numbers anyway, this shouldn't make too much on an impact right?

Cheet4h commented 7 years ago

@theChaosCoder: Can you elaborate on that?

I've looked and tried this yesterday, this is supposed to work in PowerShell after downloading the english help database and then forcing english output with e.g. "[Threading.Thread]::CurrentThread.CurrentUICulture = 'en'; get-help dir"

Sadly this only seems to work with PowerShell commands. Trying to change the output language of compact.exe does not work for me.

theChaosCoder commented 7 years ago

Yes it is a powershell command... should have written that.

It seems that you can not set a CultureInfo for a Process() method. At least I could not find a working solution.

Maybe calling powershell (powershell script) within your code is the easiest solution.

Iridium-IO commented 7 years ago

I've changed the code to use CMD along with the UICulture tag, but I'm honestly not sure that it will help - could you give it a go and see if anything has changed?

vient commented 7 years ago

It didn't help. I tried to execute chcp 437 in cmd, then run compact.exe /? — it used English instead of Russian, seems like the solution.

Iridium-IO commented 7 years ago

@vient one of the issues with working in English is that you're blind trying to go the other way :) Here's a version using chcp, see if it works? CompactGUI.zip

vient commented 7 years ago

Now it works fine for me.

There are some strange things like я instead of dots here:

Of 697 files within 27 directories
57 are compressed and 640 are not compressed.
251я524я489 total bytes of data are stored in 247я452я140 bytes.
The compression ratio is 1,0 to 1.

but it doesn't seem to affect your program.

Iridium-IO commented 7 years ago

@vient brilliant :) Those symbols aren't an issue, I patched it a while back to discard any non-numerical characters in those lines :)

Cheet4h commented 7 years ago

Have to mention that I tried chcp for german version yesterday after submitting the issue. This works for russian, chinese and other systems with different type-sets, but from what I gather german and other european languages are using a typeset similar to the english one so the output is still in german.

vient commented 7 years ago

but from what I gather german and other european languages are using a typeset similar to the english one so the output is still in german.

Yeah, there is literally the same comment from where I took this trick, didn't notice it yesterday.

So people are saying you can temporarily rename the MUI file (like in this project) or you can try to obtain translated messages in runtime prior to running anything real with compact.exe like I've written here (don't know if it will work).

Skorpys commented 7 years ago

My system (Norwegian installation and Norwegian input) has the same problem, it won't compress anything. :)

Iridium-IO commented 7 years ago

@Skorpys you'll find that the compression should finish fine, but the GUI will hang after it reaches 100%. Just click on the "Show detailed progress" checkbox and see if it's done there. If it is, you can close the program.

wojtekmaj commented 7 years ago

Fails in 1.4.0-rc.1 as well:

obraz

Iridium-IO commented 7 years ago

Ideally i want to be able to capture the message strings from compact.exe.mui in the system32 folder for each language. However I have no experience with DLLImport and don’t know how I’m going to get that working. Anyone got any ideas? If I can access the message strings in that mui file for the system language, it will fix every localisation at once as I’ll be able to replace the hard coded English strings in CompactGUI with a lookup instead

Right now I might just roll with RESX generation of languages, and allow users to submit their strings as pull requests.

vient commented 7 years ago

If I can access the message strings in that mui file for the system language

Isn't it what you are asking about? I'll try it later today to see if it works.

Siegfriedmk commented 7 years ago

Can this be same problem I have? My system is in italian. It eventually reaches 100%, but it stucks there.

vient commented 7 years ago

In compact.exe it looks like this: FormatMessageW(FORMAT_MESSAGE_FROM_HMODULE, 0, dwMessageId, 0, &DisplayBuffer, 0x1000u, &Arguments) but when I create compact process and try to do the same with its handle, it returns error 1812 ERROR_RESOURCE_DATA_NOT_FOUND. WinApi is hard.

vient commented 7 years ago

Here it is, message 19 means insufficient memory for example:

Click to Expand ```c++ #include #include int main(int argc, char **argv) { auto h = LoadLibraryW(L"compact.exe"); if (h == NULL) { std::cout << "Loading compact.exe failed with error code " << std::hex << GetLastError() << '\n'; return -1; } auto message = new wchar_t[4096]; ZeroMemory(message, 4096 * sizeof(wchar_t)); auto messageCode = 19; if (!FormatMessageW(FORMAT_MESSAGE_FROM_HMODULE | FORMAT_MESSAGE_IGNORE_INSERTS, h, messageCode, LANG_NEUTRAL, message, 4096, NULL )) { wprintf(L"Format message failed with 0x%x\n", GetLastError()); return 0; } wprintf(L"message: %s\n", message); for (int i = 0; message[i]; ++i) printf("%x ", (int)message[i]); printf("\n"); delete[] message; return 0; } ```

 

On my system, it produces the following output:

message: ???????????? ??????.

41d 435 434 43e 441 442 430 442 43e 447 43d 43e 20 43f 430 43c 44f 442 438 2e d a

hex values are symbol codes, they actually mean Недостаточно памяти.\r\n. Hope it helps!

Iridium-IO commented 7 years ago

@vient you magician, I’ll see if I can get this working :) Translating this into .NET is proving to be a not-so-fun experience

WinAPI just throws me for a loop every time

pordeciralgo commented 7 years ago

Same issue with Spanish, here.

I know it's not an ideal solution, but here's a possible workaround. While the process of compacting/checking finishes correctly, the GUI remains stuck at 0% because of the locale. How about adding a button to allow the user to manually inform the process has ended correctly?

Thanks for this amazing work, @ImminentFate

Iridium-IO commented 7 years ago

@vient I'm completely stumped, This code technically should work but I don't know why it doesn't. It works completely fine if you replace FORMAT_MESSAGE_FROM_HMODULE with FORMAT_MESSAGE_FROM_SYSTEM in the Main() function, but it doesn't seem to like using a custom handle.

I might give up for now :/

Click to Expand Code ```vb Public Class Form 1 Enum LoadLibraryFlags As UInteger DONT_RESOLVE_DLL_REFERENCES = &H1 LOAD_IGNORE_CODE_AUTHZ_LEVEL = &H10 LOAD_LIBRARY_AS_DATAFILE = &H2 LOAD_LIBRARY_AS_DATAFILE_EXCLUSIVE = &H40 LOAD_LIBRARY_AS_IMAGE_RESOURCE = &H20 LOAD_WITH_ALTERED_SEARCH_PATH = &H8 End Enum Private Const FORMAT_MESSAGE_FROM_HMODULE As Long = &H800 Private Const FORMAT_MESSAGE_FROM_SYSTEM As Long = &H1000 Private Const FORMAT_MESSAGE_IGNORE_INSERTS As Long = &H200 Private Const FORMAT_MESSAGE_MAX_WIDTH_MASK As Long = &HFF Private Const FORMAT_MESSAGE_ARGUMENT_ARRAY As Long = &H2000 Public Shared Function FormatMessage( ByVal dwFlags As Integer, ByRef lpSource As IntPtr, ByVal dwMessageId As Integer, ByVal dwLanguageId As Integer, ByRef lpBuffer As String, ByVal nSize As Integer, ByRef Arguments As IntPtr) As Integer End Function Private Shared Function LoadLibraryEx( lpFileName As String, hReservedNull As IntPtr, dwFlags As LoadLibraryFlags) As IntPtr End Function Public Function Main( ByRef strModuleName As String, ByVal msgID As Long) As String Dim rt As Long Dim sCodes As String Dim bufferStr As String Dim hModule As Long hModule = LoadLibraryEx("kernel32.dll", IntPtr.Zero, LoadLibraryFlags.LOAD_LIBRARY_AS_DATAFILE) Console.WriteLine(hModule) If hModule <> 0 Then bufferStr = Space(4096) rt = FormatMessage( FORMAT_MESSAGE_FROM_HMODULE Or &H100, hModule, msgID, 0&, bufferStr, Len(bufferStr), 0&) Console.WriteLine(rt) If rt Then bufferStr = Microsoft.VisualBasic.Left$(bufferStr, rt) sCodes = "Dec: " & msgID & vbTab & "Hex: " & Hex(msgID) GetMessageFromModule = bufferStr & vbCrLf & sCodes Console.WriteLine(bufferStr & vbCrLf & sCodes) End If End If End Function End Class ```
pordeciralgo commented 7 years ago

How about checking the ErrorLevel environment variable? http://environmentvariables.org/ErrorLevel

In a batch file it would be something like:

@echo off
compact /S /Q /EXE
if errorlevel 0 (
   echo Done!! ErrorLevel = %errorlevel%
)
theChaosCoder commented 7 years ago

I just tested compact.exe with some valid and invalid parameters in c# and yes Exit Codes would be indeed the easiest solution. See also https://msdn.microsoft.com/en-us/library/system.diagnostics.process.exitcode(v=vs.110).aspx

vient commented 7 years ago

Jesus, I spent several hours on understanding how the VB works and here is the final fix:

- ByRef lpSource As IntPtr,
+ ByVal lpSource As Integer,

Your version fails because it passes handle by reference, which is really a pointer to handle, so FormatMessage interprets pointer to handle as handle. Then it raises error 1812 ERROR_RESOURCE_DATA_NOT_FOUND because it can't find the module with such handle (so it also can't find resource in it).

FYI, I used API Monitor to compare how our programs differ in calling FormatMessage.

Siegfriedmk commented 7 years ago

@ImminentFate On the description page, you said the language bug doesn't affect the actual compression, but this is not valid for me. On reddit, a user stated he was able to compress Final Fantasy X HD Remastered from 20 Gb to 3 Gb. I tried this yesterday, and the result was same size of the original folder.

Iridium-IO commented 7 years ago

@vient Are you kidding me... I tried going back and forth between IntPtr and Integer but not once did I consider changing ByRef to ByVal.

5 hours of work foiled by one word 😢

Click to Expand Images ![d](https://user-images.githubusercontent.com/1491536/31802183-0a434e4e-b590-11e7-8de5-0679e18c8c3f.png) ![d](https://user-images.githubusercontent.com/1491536/31802271-961ff3b8-b590-11e7-90a5-6662886df652.png)
Iridium-IO commented 7 years ago

@Siegfriedmk would you be able to post a screenshot of the top and bottom of the console output when you try? Expand the window so you can see the whole row before you send it

Iridium-IO commented 7 years ago

I might be way out of my depth here sadly. I'm not familiar with C++ at all, and using WinAPI is a pain.

I can get the messages to show from the language table if I manually enter the requested messageID, but I haven't the faintest idea how to grab the messageID in realtime as a process is running, which means I also have no idea how to process the %1 and %2 etc output variables that come when using formatmessage.

Iridium-IO commented 7 years ago

I think I'll just download a bunch of language packs on my computer, and see if their language strings are organised in the same way, i.e. "textstringnumberone %1 textstringnumber2 %2" That way I can check for strings by doing:

 If e.Data.StartsWith(GetMessageString("&H8").Split("%1")(0)) Then 
                byteComparisonRaw = e.Data
            End If

This example will check that the output text matches the message at ID 0x08. In English it would be "Of", in German it would be "Von" etc.

vient commented 7 years ago

I haven't the faintest idea how to grab the messageID in realtime as a process is running

Do you really need to do this? I thought about it in the way that you can replace message texts in your code with message ids, then resolve all possible messages at once at startup.

and see if their language strings are organised in the same way

That should be the case, what’s the point in doing such language system if messages differ in structure.

Iridium-IO commented 7 years ago

Do you really need to do this? I thought about it in the way that you can replace message texts in your code with message ids, then resolve all possible messages at once at startup.

I thought it would be neat if I could listen for the FormatMessage() call that compact.exe itself makes, then it would be much easier to just parse in the %1, %2, %3 etc. values directly.

Hopes were too high :) I'm doing it the sane way now

Iridium-IO commented 7 years ago

@vient just when I finished adding the new logic, I decided to try downloading the Russian language pack and seeing if it works - but, Russian uses a slightly different format for some messages which ruins it: English:

Of %1 files within %2 directories
%3 are compressed and %4 are not compressed.
%5 total bytes of data are stored in %6 bytes.
The compression ratio is %7 to 1.

Russian:

Всего файлов: %1, каталогов: %2.
Из них сжато: %3, не сжато: %4.
Данные объемом %5 байт сохранены в %6 байт.
Степень сжатия %7 к 1.

I might have to do a triple condition check for each line just to make sure it always works, i.e.


If e.Data.StartsWith("value to the left of %1") And _
e.Data.Contains("value between %1 and %2") And _
e.Data.EndsWith("value after %2") Then

    FilesWithinDirectories = e.data

End If
vient commented 7 years ago

Maybe you can use Levenshtein distance to recognise message types?

Iridium-IO commented 7 years ago

@vient So that's what it's called. I knew it existed but couldn't look it up, so I started trying to hack together my own version.

untitled

Mine has an issue - I'm currently using whitespace as the split point, but some people use whitespace as the number separator for thousands which makes this a bit annoying, i.e. So, for an output of:

200 000 total bytes of data are stored in 120 000 bytes.

You'd get:

FMT CON
%5 200
total 000
bytes total
of bytes
data of
are data
stored are
in stored
%6 in
bytes. 120
Iridium-IO commented 7 years ago

I should probably go to sleep... 5 minutes after posting that I remembered that Regex exists to easily solve the problem of characters between numbers.

Siegfriedmk commented 7 years ago

@ImminentFate I'm trying with 1.3.5.1 and it's stuck on 7% since 1 hour, with FFX HD Remastered

vient commented 7 years ago

but some people use whitespace as the number separator for thousands

I wonder how it works then in swscanf, how can it read a number properly if space can be used in it.

I remembered that Regex exists

Yeah, really, that sounds like an easiest approach.

Iridium-IO commented 7 years ago

I wonder how it works then in swscanf, how can it read a number properly if space can be used in it.

Probably like this, parse the string to remove number spaces before further processing:

Dim input as String = "This number (123 324,234'123) has characters between the digits"
Dim rgx As New Regex("(?<=\d+)\s+(?=\d+)")
rgx.Replace(input, "")

Which would output: This number (123324234123) has characters between the digits

vient commented 7 years ago

OK, but how then do you pass two numbers? My guess it that spaces in numbers are not valid for scanf (too lazy to check).

Siegfriedmk commented 7 years ago

@ImminentFate I tried now with 1.4.0 rc0 , and same thing. Still stuck on 7%. Anyway, reading log, it seems to be completed, also if I go to see the size of folders, is still the same. Here is the complete log http://www.heypasteit.com/clip/0IIWAJ

theChaosCoder commented 7 years ago

As you can see there is a giant 20gb file which is not compressible. FFX_Data.vbf 20701606838 : 20701606838 = 1,0 a 1 [OK]

And "stuck" means here "I'm working on a giant file"

Have you tried forcing the compression?

vient commented 7 years ago

@Siegfriedmk I think you need to make separate issue for your problem because it is not connected with current one. Maybe the title is misleading, Doesn't work properly on Non-English systems means GUI doesn't work properly on Non-English systems, compression part is not affected.

Siegfriedmk commented 7 years ago

@vient yeah, sorry. I'll try later to wait a little longer. I thought it was stuck because cpu and ram usage was very low 5% and 15%. Thanks guys.

Iridium-IO commented 7 years ago

@Siegfriedmk don't judge by the CPU usage, check the disk usage. And if the detailed output doesn't show the compression results at the end, it means it's not finished working, but its not frozen.

Iridium-IO commented 7 years ago

@vient would you be able to try this out for me before I migrate the logic into the main program?

TestWinAPI.zip

  1. enter a directory if you wish (or not, it will default to analysing the folder it's in)
  2. click analyse (first without the chcp option ticked)
  3. click print and fill tables
  4. see if the results match up to the console window (a screenshot would be handy if you could post it :) )
  5. click EngTest, then hit print and fill tables again (only the first line in the print output should change, it's hardcoded, but the fill tables should now be in english.
  6. run 1-5 as above but with the Force CHCP 437 box ticked.

@Siegfriedmk @Cheet4h if you guys could test these on your systems (Italian and German I believe) that would be incredibly useful too :)

vient commented 7 years ago

Sure thing.

Default (Click to See Image) ![image](https://user-images.githubusercontent.com/7602242/31849104-ad68562c-b645-11e7-8645-5bc92b9b3404.png) (maybe I did something wrong? `Analyze - Print - Fill Tables` and it crashed): ``` System.ArgumentNullException: Значение не может быть неопределенным. ```

 

EngTest (Click to See Images) ![image](https://user-images.githubusercontent.com/7602242/31849141-4ee56800-b646-11e7-9b83-9df8b14144e3.png)

 

With CHCP 437 (Click to See Images) ![image](https://user-images.githubusercontent.com/7602242/31849226-d8b97bd8-b647-11e7-9fd5-24c3ef72ecf0.png)