bitwiseworks / mozilla-os2

Mozilla for OS/2 and OS/2-based systems
Other
34 stars 9 forks source link

Make IPC work on OS/2 #9

Closed dmik closed 11 years ago

dmik commented 11 years ago

Newer releases of Firefox bring a new and very important feature: the ability to run browser tabs and plugins in separate processes. This makes the browser much more stable since if a web page crashes in JavaScript code or in a plugin, it won't take other browser tabs or windows down, only the crashed one stops functioning.

While this feature is not fully implemented in Mozilla itself(the current status is here), more and more components start using the multi-process mechanism for various things.

Unfortunately, the implementation of the IPC (inter-process commuication) mechanism used to provide this feature is not complete on OS/2. So far, we have to disable all its use with the MOZ_IPC guarding define everywhere in the Firefox source code. However, due to the growing use of this mechanism it becomes more and more difficult to maintain the MOZ_IPC workaround and, more importantly, this negatively affects the overall browser performance and stability.

For this reason, thie IPC mechanism should be implemented as soon as possible.

dmik commented 11 years ago

Actually, I can't find any evidence that this feature is already completely implemented in any version of Firefox. What I found is this Mozilla project: https://wiki.mozilla.org/Content_Processes. But it only mentions that this feature is going to be implemented in 2013.

Dave, couldn't you clarfiy a bit on this? What exactly has changed in release 17 since release 10 that forces us to disable code guarded now with MOZ_IPC?

dmik commented 11 years ago

Just for reference, here are the commits that bring MOZ_IPC guarding:

There will be some more. Here they are:

These commits should be reverted (as well as other possible commits introducing the MOZ_IPC define in some way) once we have implemented the IPC mechanism on OS/2.

dryeo commented 11 years ago

In the long run-up to FF 4 Mozilla started implementing IPC for running Flash in its own process as it crashed so much taking the browser down with it. FF4 had the configure option to disable IPC which we used. After FF4 was released the disable IPC stuff was removed and Walter reversed the patches and reinstated it and maintained it for quite awhile allowing the release of FF6 and up. As time has gone on more and more of Mozilla has depended on the various classes introduced with IPC and the increasing dependencies on the IPC code the patches grew and grew with various rendering, DOM, and other code depending on the IPC code. All though it doesn't seem to have been totally implemented yet (I haven't been keeping close track) it is now scattered so much through the code base that the patches became beyond my ability to keep up with. Walter had the same problem as well as personal problems taking his time. (Walters skills were closer to mine compared to Rich or any other real programmer) Rich considered it necessary to implement as soon as possible, the bug is https://bugzilla.mozilla.org/show_bug.cgi?id=536262. When I looked at IPC I got stymied by all the IPC stuff using wide chars and kLIBC missing vswprintf() at least and after following a different project implement vsprintf() for MSVC I realized I'm not qualified to do much with it. I believe ODIN has an implementation of vswprintf() in it. Rich also suggested just stubbing all the functions as an alternative to the IPC patch and I did do that somewhat between 10 and 17. I ran into other problems with my build of FF11, xpcshell.exe started crashing on large objects and extensions broke. Trying to bisect found me chasing various branches and I got very frustrated in not being knowledgeable enough, not having a debug build at the time and just the work to keep rebasing the IPC patches. Here's an interesting report on Firefox maintainability, http://almossawi.com/firefox/prose/ which points out that changing any file directly impacts 8 other files directly and indirectly impacts 1400 files on average.

dmik commented 11 years ago

Okay, this is what I guessed more or less. Your comment lets me think I understand the problem right. I have corrected the issue description.

dmik commented 11 years ago

I will rollback the patches from https://github.com/bitwiseworks/mozilla-os2/issues/9#issuecomment-17963480 and try to build it. In order to satisfy the linker I will try to provide a quick dummy implementation of IPC just to make it build.

dmik commented 11 years ago

Reverted all patches from comment above. Note that building telemetry requires the simplejson python module (downloadable from https://pypi.python.org/pypi/simplejson/). The build breaks now in the IPC subdirectory (as expected). Fixing that.

dryeo commented 11 years ago

Simplejson was installed automatically for me, probably due to my having setuptools (https://pypi.python.org/pypi/setuptools) installed or by pip from virtualenv. After 17 having a working virtualenv becomes a requirement.

dmik commented 11 years ago

Okay.

It seems that the IPC implementation that they use is different from what they had in Mozilla back then (and what I significantly improved for VirtualBox). That one was mainly an engine for inter-process XPCOM classes (an equivalent of Microsoft DCOM). This one is to exchange events between processes (AFAICS). However, I didn't come to XPCOM yet so I'm not sure what they use there (we will see later).

The interesting thing is that they use parts of the Chromium code in order to implement IPC events -) (see https://github.com/bitwiseworks/mozilla-os2/tree/master/ipc/chromium). So most of the work within this ticket will go there I suppose. There is Chromium code itself and a 3rd party library, libevent, which I already ported to OS/2 (which was quite trivial though). The Chromium bits should not be difficult too. Working on it now.

dmik commented 11 years ago

The next Chomium task is shared memory classes. They use mmap() on linux and mapped files on Windows. But this is only used for sharing data structures between processes AFAICS and not for permanently storing them in the file system. This means that simple shared memory blocks should work well on OS/2, no need in full mmap emulation.

dmik commented 11 years ago

The next Chomium task is shared memory classes. They use mmap() on linux and mapped files on Windows. But this is only used for sharing data structures between processes AFAICS and not for permanently storing them in the file system. This means that simple shared memory blocks should work well on OS/2, no need in full mmap emulation.

P.S. This comment was left many days ago... This is a problem of GitHub, realy. I write a commit then switch to preview and forget to press the green Comment button. Happened many times already, definitely a defect in usability.

dmik commented 11 years ago

Anyway, I have provided OS/2 implementations of path, threads, shared memory, transport DIB and debug utils. In order to estimate future progress, here's what's missing in chromium yet (OS/2 part, judgng by file names only, there may be more or less work depending on a particular class, or no work at all if Posix code is acceptable):

Quite a few, actually. Can take a week to check and port them all.

(offtopic: It's really sad that every dev wants to invent a wheel and I have to port the same stuff over and over again. There should be really a single unified cross-platform library with all this common machinery. Okay, two. Not more. No, no democracy here please).

P.P.S. It's not only the preview thingy. Sometimes pressing Commit doesn't actually submit the comment but resize the comment box. This is what should've happened to me last time at least.

dmik commented 11 years ago

To speed it up, I will try to provide a minimalistic OS/2 implementation of the above, we can work on selected classes later if really needed.

dmik commented 11 years ago

Note that for most classes I also try to use keep the POSIX implementation for speeding up the process. For example, the ipc channel class in chromium uses the POSIX implementation based on socketpair(). While it should work per se, later we may want to change it to native named pipes, for performance reasons.

The similar thing is with other classes. Given that, the chromium excerpt is almost ready, only the string conversion functions are to do.

Meanwhile I also started exploring and porting other parts in the ipc subdirectory.

dmik commented 11 years ago

Actually, there is some related code in other directories as well. For example, dom/plugins/irc. I have to port it too.

dmik commented 11 years ago

Fixed some bits in dom/plugins/irc. The ipc directory now builds completely as well as hal. The build went much further (and is still going on, in fact). There are more fixes to be committed after I make sure they build.

dmik commented 11 years ago

Apparently, there is a big pile of platform specific code in dom/plugins/ipc (among them are PluginInstanceChild and PluginInstanceParent). The code is related to passing data and control between windows of different processes and heavily involves the native windowing system (PM in case of OS/2). I'm currently working on that but it requires more work to complete this part.

dmik commented 11 years ago

BTW, Mozilla has some support for using Qt as the widget/toolkit platform. This is partly involved in IPC as well. We should really consider this option for us as this may dramatically decrease the level of support we need to provide to keep Firefox current on OS/2. I will be keeping this in mind while doing my work. There is just so many new things to do to get the native support so it may take quite a while before we get a working version...

dryeo commented 11 years ago

Last time I looked the QT code was quite intertwined with the X code. At one time there were QT OS/2 builds which used X11 for display.

dmik commented 11 years ago

As far as I see, Qt is actively used in FFox on mobile platforms ATM (Android, Mameo?) but FFox itself is limited there so I'm not sure if it will get us all we need at this stage. A desktop build of FFox on Linux with Qt enabled must be checked first to see how far they are. A bit of X11 dependency in the Qt code path is not a big problem, these things can easily be adapted to OS/2.

dmik commented 11 years ago

BTW, the part I'm now working on (the PluginInstance classes) is actually a mediator between the browser and the plugins. It is responsible for having each plugin instance in a separate process. This mediator is very similar in its architecture to what we do in Flash and in the Java plugin — but in our case it (basically) provides parameter/environment conversion between the OS/2 browser interface and the Win32 plugin interface. We may use that later for Flash and Win32 by adding a new mediator right to FFox that will inherit PluginInstance and combine it with our Odin-based one. This way, we will get generic support for WIn32 plugins in the OS/2 version of Firefox.

dmik commented 11 years ago

Committed a big pile of code for the PluginInstance protocol and friends. The code is not testcased and probably contains some errors, this is for later — once we get it running. Doing the full build to see what's next.

dmik commented 11 years ago

As expected, there are some missing calls in the chromium library (which then gets into xul.dll). Working on that now. Besides that, there should be no more code to port. Small things are done, the biggest issue is a Wide char version of printf() (Dave told about that already) for which I'm going to check Odin which contains a working implementation.

dmik commented 11 years ago

Some progress. XUL.DLL is now built (which is a big beast inolving in particular ipc/chromium and many other things like discussed above). Started a full rebuild again, let's see what's left.

dryeo commented 11 years ago

Which rc are you using? rc.exe will fail if the dll object is to big whereas wrc will add the resources. Also which gcc? lxlite -c:exemap -vf- xul.dll will show the size of the object (might need Stevens latest lxlite)

dmik commented 11 years ago

I use the latest gcc from RPM (4.4.6 with my latest optlink regression fix) + gcc-wlink and gcc-wrc. So I have no any problems with big resources.

dmik commented 11 years ago

The tree is completely built, finally! It doesn't run though, just exits. Investigating.

dmik commented 11 years ago

It starts now. Loads XUL.DLL, XPCOM.DLL and friends and then starts spitting this:

[warn] select: Invalid argument

I bet it tries to select on file handle or something like that (this is not supported by kLIBC, only TCP/IP sockets may be selected).

dmik commented 11 years ago

The above problem has been fixed by using socketpair instead of pipe. Now the event loop just terminates and then firefox crashes during cleanup. Both behaviors are wrong, investigating further.

dmik commented 11 years ago

I've just committed a lot of kmk code to build the stuff. This includes generation of interface headers from .idl files and protocol headers and sources from .ipdl files. It also includes a lot of nice kmk enhancements that make the build system porting process really easy.

The result looks really good so far. The build process is very straightforward now. Subsequent runs will only redo those files that have been really changed (this applies to ALL generation and installs tasks!) — this greatly saves time during incremental development, especially if you often rebuild from the root directory. The regular make files in kmk flavor are IMHO much more clean. They have more structure in them, they are more compact and better readable and don't require any regeneration of themselves when changed. Another important benefit of kmk is that due to better structure parallel compilation works better. Almost everything is run in parallel with no extra work from the makefile maintainer. This significantly speeds up the build process.

At the current stage it builds up to (and including) xpcom. You are welcome to try it.

Most kmk constructs are here so getting a full build is just a matter of some more monkey work (there are many small modules that need a .kmk file) which I expect to complete in a few days.

dmik commented 11 years ago

I also forgot to mention another cool advantage of kmk — the build output is much less verbose by default, one line per recipe. It doesn't contain unnecessary details (unless there is a build error) so it's easier to spot compiler warnings and other messages that indicate various problems.

dmik commented 11 years ago

Back at this ticket. I managed to build the debug build of Mozilla using configure but in fact the most needed DLL - XUL, fails to link with -g here, as was predicted. Yes, mozjs is in a separate DLL.

I think eventually we will have to break it down to even smaller pieces. As I see from make files, many modules can be built as DLLs. I will investigate that.

dryeo commented 11 years ago

On 09/16/13 03:00 pm, Dmitriy Kuminov wrote:

Back at this ticket. I managed to build the debug build of Mozilla using configure but in fact the most needed DLL - XUL, fails to link with -g here, as was predicted. Yes, mozjs is in a separate DLL.

I think eventually we will have to bring it down to even smaller pieces. As I see from make files, many modules can be built as DLLs. I will investigate that.

Thebes.dll would be the next largest then xpcomcor.dll. From Seamonkey 2.1a2, 7-09-10 10:02p 170572 54 freebl3.dll 7-10-10 12:06a 10136 54 gfxutils.dll 7-10-10 12:06a 66506 54 gkgfx.dll 7-10-10 12:12a 113816 54 ldap60.dll 7-10-10 12:11a 8076 54 ldif60.dll 7-09-10 7:57p 6151 54 mozalloc.dll 7-10-10 12:04a 940181 54 mozjs.dll 7-09-10 8:28p 352565 54 mozsqlt3.dll 7-09-10 8:13p 50968 54 mozz.dll 7-09-10 7:58p 122730 54 nspr4.dll 7-09-10 10:06p 469443 54 nss3.dll 7-09-10 10:08p 192511 54 nssckbi.dll 7-09-10 10:02p 70222 54 nssdbm3.dll 7-09-10 10:01p 47047 54 nssutil3.dll 7-09-10 7:59p 10492 54 plc4.dll 7-09-10 7:59p 8598 54 plds4.dll 7-10-10 12:12a 10871 54 prldap60.dll 7-09-10 10:07p 68404 54 smime3.dll 7-09-10 10:02p 103388 54 softokn3.dll 7-09-10 10:06p 88995 54 ssl3.dll 7-10-10 12:06a 992109 54 thebes.dll 7-09-10 8:12p 8897 54 xpcom.dll 7-09-10 8:12p 407591 54 xpcomcor.dll 7-10-10 12:10a 95082 54 xul.dll 7-09-10 8:38p 6441 54 ycbcr.dll

Dave

dmik commented 11 years ago

Hmm? According to your output, they are all small (less than 1MB). From what I see here, the biggest one is the gklayout library (its .o files are several dozen MB in total, not all ends up in XUL.DLL but still), the next is mozipdlgen (24MB or so), then comes chromium (5MB or so). I'm trying to make these DLLs.

dryeo commented 11 years ago

On 09/16/13 05:52 pm, Dmitriy Kuminov wrote:

Hmm? According to your output, they are all small (less than 1MB). From what I see here, the biggest one is the gklayout library (its .o files are several dozen MB in total, not all ends up in XUL.DLL but still), then ext is mozipdlgen (24MB or so), then comes chromium (5MB or so). I'm trying to make these DLLs.

I forgot about the 56 DLLs under components. gklayout.dll is almost 5 MBs, the only other over a MB is mail.dll, the total is 13.5 MBs of dll in components Dave

dmik commented 11 years ago

I wonder how they build them in Seamonkey — Mozilla makefiles don't seem to support compilation of components as DLLs. You can do it manually of course (this is what I'm trying now) but there are too many dependencies you have to resolve...

dmik commented 11 years ago

Okay, I give up the idea to put gklayout to a separate DLL right now — there are so many cross-dependencies that I think that the only way is to build it as a DLL is to also build all other component libraries as DLLs but this requires careful resolving of all these dependencies (possibly with preliminary creation of import libraries). Too much work. I will better do that for kmk later.

So I will have to debug it further w/o the debug symbols — using printf(), map files and raw assembler.

dmik commented 11 years ago

I found so far that it crashes when trying to initialize some (default?) plugins using the new plugin infrastructure (IPC-based). Digging through it now. And BTW I have to use the release version of XUL.DLL because the debug version is perhaps corrupt (even with debug symbols removed): it's 20M compared to 36M of the release version which can't be true given that it contains more stuff (debug methods, assertion code and so on). It crashes much earlier than the release build, according to the crash log — when calling an exported entry (XRE_StartupTimelineRecord). Right at its entry point because it contains garbage. The stack is busted at that point. One of the possible reasons is that fixups are not correctly set up due to some defect in the DLL itself.

This all additionally slows me down.

dryeo commented 11 years ago

On 09/17/13 10:49 am, Dmitriy Kuminov wrote:

I wonder how they build them in Seamonkey — Mozilla makefiles don't seem to support compilation of components as DLLs. You can do it manually of course (this is what I'm trying now) but there are too many dependencies you have to resolve...

It's a build from back before they went to the humongous xul.dll. There used to be an option to build a static build, which still produced a few DLLs such as nsprpub.

dryeo commented 11 years ago

On 09/17/13 07:53 pm, Dmitriy Kuminov wrote:

I found so far that it crashes when trying to initialize some (default?) plugins using the new plugin infrastructure (IPC-based). Digging through it now. And BTW I have to use the release version of XUL.DLL because the debug version is perhaps corrupt (even with debug symbols removed): it's 20M compared to 36M of the release version which can't be true given that it contains more stuff (debug methods, assertion code and so on). It crashes much earlier than the release build, according to the crash log — when calling an exported entry (XRE_StartupTimelineRecord). Right at its entry point because it contains garbage. The stack is busted at that point. One of the possible reasons is that fixups are not correctly set up due to some defect in the DLL itself.

You should start out running with -safe-mode to disable plugins and extensions. FF11 here only ran in -safe-mode due to some breakage in the new JavaScript engine, TraceMonkey. xpcshell (built from JavaScript sub-tree) was also broken, crashing during make package.

Are you building a full debug build? If so IIRC it stopped working even before the change to xul. Can you just build with -g and strip disabled?

My guess is that some of the 16 bit structures in the OMF object are overflowing, Steven wondered about if it could be fixed in the linker.

dmik commented 11 years ago

Good advice about -safe-mode, but it doesn't help here. The crash happens somewhere earlier.

Yes, the strange crash in XUL.DLL happens in the full debug build.

I've also tried to just remove -s from the link options in the release build (-g is the defalut one) and I get the very same problem as with -g in the debug build: inability to link the DLL. Wink just keeps telling me this:

Error! E3009: dynamic memory exhausted

So, back to -s, release and printf.

dryeo commented 11 years ago

Strange that a debug build exhausts dynamic memory, I thought the fake libs fixed that. Anyways use the wl.exe that I put in with the patches. Rich patched it to more aggressively use high memory. It was the only way to build before fakelbs. The dynamic memory is where wlink keeps all the object names and paths and isn't swapped by wlink unlike other memory. Rich changed DosAllocMem() (with objany) from only being used for 64kb chunks to using 4kb chunks.

dryeo commented 11 years ago

BTW the configure build automatically built with debug-symbols and we disabled it in .mozconfig with ac_add_options --disable-debug-symbols. It did the right thing in disabling stripping. The debug symbol build of FF10 that I compiled crashed when started in the debugger due to lack of memory (probably needs better error checking) but ran fine and could be attached by the debugger. This could be a problem if it is crashing that early. The debug build is on Netlabs.

dmik commented 11 years ago

Well I don't see how fake libs could solve the problem — they are all linked together in the end any way.

But I must say that wl.exe from one of your ZIPs indeed fixed the builds with debug symbols included. I now can successfully build both the release and debug version of XUL.DLL (which is around 300MB now). The debug version still doesn't work (the same problem with entry points). The release does and although I already found the crash point with printf this will really help me in my further work, thanks (I wish I found it earlier).

BTW, it's a pity this patched version of WLINK is not in our gcc RPM. We should collaborate better. I've created an RPM ticket for this issue: http://svn.netlabs.org/rpm/ticket/64.

abwillis commented 11 years ago

Do we have the diffs for wlink? I would like to build off of the current watcom tree.

dmik commented 11 years ago

Andy, if you do this, please do it over github (or netlabs but I'd prefer the former for a number of reasons).

dryeo commented 11 years ago

I thought the real libs used absolute path names for the object files while fake libs used relative paths which would use less memory. I stopped needing the patched wl after the switch to fake libs IIRC. Knuts diff is in ftp://ftp.netlabs.org/pub/gcc/wl-hll-r1.zip. I think it is the call to DosAllocMem at line 2376 in the diff (about line 197 in the patched source) needs adjusting to more aggressively use high memory, 4KB blocks instead 64 KB blocks IIRC.

StevenLevine commented 11 years ago

In bitwiseworks/mozilla-os2/issues/9/24790381@github.com, on 09/19/13 at 10:10 PM, dryeo notifications@github.com said:

call to DosAllocMem at line 2376 in the diff (about line 197 in the patched source) needs adjusting to more aggressively use high memory, 4KB blocks instead 64 KB blocks IIRC.

This will not work. DosAllocMem always allocates a minimul of 64K of address space.

IMO, the best long term fix is to implement upper memory heap support the same way that libc does. I have a rough design for the mods needed to support this. What's not done is the wrappers for Dos... APIs that will not support data buffers in the upper arena and I've not fully analyzed all the serialization issues. However, my current analysis says the existing serialization will be sufficent.

dryeo commented 11 years ago

I misremembered, to quote the email from Rich Walsh,

Good News: I've posted my patched version of wl.exe: _URLredacted It occurred to me that if wl used high memory, it wouldn't run out of address space. I looked at the binary and found that it already did. I then looked at Knut's patch and found that it only used hi-mem for allocations GTE 256k. I patched that so it would use hi-mem for 16k or bigger. It worked (linking the debug xul required 720mb total!!).

StevenLevine commented 11 years ago

It occurs to me that one solution for the oversize XUL.DLL problem would be to generate a retail build with debug symbols.

dmik commented 11 years ago

Yes, as I say in the comment above, this is what I do and it works so far.