RamonUnch / GreenPad

Fork from roytam1 fork from original GreenPad
22 stars 0 forks source link

Sometimes crash when opening binary file #15

Open roytam1 opened 2 years ago

roytam1 commented 2 years ago

There is another bug that I can't debug for a long time. Sometimes it crashes when opening a binary file (normally you shouldn't use GreenPad to open binary files, but sometimes you don't know the file is binary or not), open it next time may not crash, this gives me some headache.

RamonUnch commented 2 years ago

I had this problem a few times (even with an aschii file) I fixed one out of bound array access https://github.com/RamonUnch/GreenPad/commit/e624a8c8b34ec95a0a2d8700faec55af26aeece1. There are probably others, the trick is to retry to get a crash with the debug version many times. To get a reproducible crash, I had to use drag&drop in the GreenPad window, running from the command line would not do it.

There is also another problem with syntactic coloration. Normally you should have no coloration between single or double quotes and you still do. Also sometime a Japanese kanji gets truncated before an ascii character/end of line (probably) I do not know any ideograph but for sure you get the same problem for other languages.

roytam1 commented 1 year ago

found and fixed part of this issue: the crash can be inside chardet. https://github.com/roytam1/rtoss/commit/71d2a9ebb201aeb8d29620b495f3d0b624788d77

RamonUnch commented 1 year ago

Nice finding.

I also realized yesterday that the definition of DLGPROC changed from BOOL CALLBACK to INT_PTR CALLBACK in the windows headers. In 32 bit mode it does not change anything, however in 64 bit mode it is not equivalent. To fix x64 builds I guess K. Inaba added an ugly CAST to remove the error, but what should be done is to replace BOOL CALLBACK DlgImpl::MainProc() by INT_PTR CALLBACK DlgImpl::MainProc()

So if you replace the definition you will still need a cast, but for the old headers this time and all builds should be correct. However Microsoft is aware of this problem and has workarounds to avoid those kind of problems.

RamonUnch commented 1 year ago

I do have sometime GreenPad stuck in an infinite loop, eating 100% of a core (I think it happens when exiting), It is hard to debug, it happens rarely but I cannot reproduce the bug with the debug build :( I mostly uses the GCC build (the fastest). I will try with the same a GCC build without the USE_ORIGINAL_MEMMAN flag to see if it is the culprit. Drmemory reports uninitialized reads in the memory manager code, but I cannot figure out why. Maybe it is false positive but i get no warnings without USE_ORIGINAL_MEMMAN

RamonUnch commented 1 year ago

Progress in memory Manager that would access outside of pools_ mages https://github.com/RamonUnch/GreenPad/pull/74/commits/78440cb222ba03ec0e9b438e9d91802d2cd210a6

roytam1 commented 1 year ago

And found another oddities. If I open a big binary file (PHOTOSHOP7.0.ISO in 170,233,856 bytes) from command-line (or DnD to GreenPad.exe) it will cause Out-of-Memory error, but if I opened GreenPad and DnD file to main window, it opens without OOM error (using ~587MB memory)

EDIT: test again with x64 binary, DnD to EXE vs DnD to main window causing huge memory usage difference (~2021MB vs ~666MB)

RamonUnch commented 1 year ago

In theory the minimum needed RAM in bytes is: number of characters * (2 + 1) + raw size of file + overhead due to many small line buffers and other stuff.

So the worst case scenario for an ANSI file should b 4 * sizeof file + overhead. But I cannot see a difference on my fork between drag and drop on the main window or command line. Maybe there is a difference on this part between our forks.

roytam1 commented 1 year ago

But I cannot see a difference on my fork between drag and drop on the main window or command line. Maybe there is a difference on this part between our forks.

because original implementation doesn't showup main window(right after Create() in GreenPadWnd::StartUp()) if opening file from command-line. and this made a difference.

after showing up main window, I can open PHOTOSHOP7.0.ISO in my GreenPad with ~587MB memory usage. testing with your 1.17beta2 it takes ~253MB but it is still stalling while I'm editing this comment.

EDIT: it opens now in UTF32, I reopened it with MSDOS(us) and now it takes ~610MB. and mine is ~619MB after reopening. so maybe it should be fine now.

The question remaining is: why memory usage difference is that huge when main window is not showing up early?

RamonUnch commented 1 year ago

The question remaining is: why memory usage difference is that huge when main window is not showing up early?

This is really disturbing indeed. Maybe we you try to create +show a dummy window just before loading the file to see it it is an internal window thing.

RamonUnch commented 1 year ago

testing with your 1.17beta2 it takes ~253MB but it is still stalling while I'm editing this comment.

Indeed I think it is because I treat all characters <31 like spacing characters, so parsing is probably very slow. Also the line length is so huge, we run into problems.

roytam1 commented 1 year ago

Maybe we you try to create +show a dummy window just before loading the file to see it it is an internal window thing.

too lazy doing this, I'm taking your approach: https://github.com/roytam1/rtoss/commit/a647ea829af9ede0fc1dce25289c05f66496f53d

RamonUnch commented 1 year ago

I got a crash when using JIS (cs = -933) ie Japanese(ISO-2022-JP) when opening some binary file. It happens with the micross.ttf Microsoft Sans Serif font file (that I go on my install).

https://github.com/RamonUnch/GreenPad/blob/13f4fdcfb58e2c8161a79cbb7b163eb05e1284bf/kilib/textfile.cpp#L1109-L1112

The debugger says that the value of GR is 0x00000002, when it should be a pointer to G[1], whatever happens, (when you read the source).

So I replaced *GR by G[1] and it works (no more crash), but it does not make any sense! Then when I go to delete the GR variable from the declaration of the rIso2022 class I again get another crash and this time because it is dereferencing GL that was set at a value of 0x00000002. GL can either point to G[0] or G[1]. So it mean there is some write that occurs beyond the proper bounds but I cannot figure out what.

I will read again the source to see if I can make sense out of that and then I will try to understand why I also get a crash when trying to open using FSS-UTF(19920902) encoding.

roytam1 commented 1 year ago

The debugger says that the value of GR is 0x00000002, when it should be a pointer to G[1], whatever happens, (when you read the source).

but there are only some places accessing GR:

https://github.com/RamonUnch/GreenPad/blob/13f4fdcfb58e2c8161a79cbb7b163eb05e1284bf/kilib/textfile.cpp#L1065

https://github.com/RamonUnch/GreenPad/blob/13f4fdcfb58e2c8161a79cbb7b163eb05e1284bf/kilib/textfile.cpp#L1109-L1112

https://github.com/RamonUnch/GreenPad/blob/13f4fdcfb58e2c8161a79cbb7b163eb05e1284bf/kilib/textfile.cpp#L1167

and they look sane to me, so why it is no longer be a pointer?

roytam1 commented 1 year ago

I will read again the source to see if I can make sense out of that and then I will try to understand why I also get a crash when trying to open using FSS-UTF(19920902) encoding.

and for FSS-UTF, the original spec I found still exists on the web: (search "/usr/ken/utf/xutf from dump of Sep 2 1992") http://doc.cat-v.org/bell_labs/utf-8_history

RamonUnch commented 1 year ago

It seems indeed that is is impossible but if some indirect access to a close-by variable was badly performed it can change the value. for example:

#include <stdio.h>

int main(void)
{
    int x = 42;
    int y[4];
    y[4] = 33;
    printf("x = %d\n", x);
}

You will see x printed to 33 and not 42, because y[4] is no longer in the array and instead y+4 points to x. Most compilers will order like this.

roytam1 commented 1 year ago
  int y[4];
  y[4] = 33;

thats out-of-bound access, valid access range of int y[4] is y[0] to y[3]. I wonder if compiler will spit warning message or not.

RamonUnch commented 1 year ago

So I think I found the bug:

G[ (p[0]-0x28)%4 ] = ASCII;         // 1B [28-2B] 4A

is not valid because even if p[0] is unsigned 0x28 is not, thus the difference is signed and modulo on signed int can be negative. So better use &3 in such a case.

RamonUnch commented 1 year ago

thats out-of-bound access, valid access range of int y[4] is y[0] to y[3]. I wonder if compiler will spit warning message or not.

In a simple case like this compilers will emit a warning but when you do funny pointer arithmetic like above it will not be detected, (I also build the project with the latest gcc with -Wall -Wextra and it was not able to see the above). Modern compilers have great warnings though.

I can confirm that the above seems to fix the issue.

RamonUnch commented 1 year ago

https://github.com/RamonUnch/GreenPad/pull/91

Also a cool side-effect is better performances and smaller exe (in my case).

roytam1 commented 1 year ago

So better use &3 in such a case.

did this in my side as well.

RamonUnch commented 1 year ago

I fixed the crash in rUtfOFSS reader by implementing Eof() with a greater or equal rather than an equal. I applied the modification to al readers because it should not change performances and could avoid more crash in the future.

92

RamonUnch commented 1 year ago

Potential out of bound acess in rightOf(): https://github.com/RamonUnch/GreenPad/commit/f55656731ce35a8da3eb4d659ab5b3ccd6e0b417 I think it never caused problems but better be safe and a boundary check should always come before de-referencing.