NeilWaldhauer commented 6 years ago

Using Firefox 45.9.0, with or without the new XUL.DLL from x3.7z, CPU usage on the default Firefox web page is low, in the few percent. When minimized, this usage jumps to 50-100% on the dual processor desktop computer, but to 99% on a slower single processor computer.

lerdmann commented 6 years ago

I can confirm that problem. What I can also observe: 1) if you minimize Firefox and then use TOP you will notice that thread 1 (the main GUI thread) and another thread are always in the "runnable" state 2) if you have Firefox running with a visible window, TOP still shows thread 1 to always be in the "runnable" state

That is really odd. If I start Thunderbird and have a look with TOP, all threads of Thunderbird will be in the "blocked" state. It looks like there is something in Firefoxes main thread that is constantly executing without ever deliberately giving up its time slice (of course the OS will reschedule after some maximum timespan). If Firefox is minimized this behaviour extends to yet another Firefox thread.

lerdmann commented 6 years ago

Even more strange: with the Firefox main window visible, after some time, all Firefox threads will eventually enter the "blocked" state. However with Firefox window minimized, the aforementioned behaviour persists with 2 threads being in the "runnable" state.

SilvanScherrer commented 6 years ago

hmm does that happen always? or only if certain websites are open? I just tried it with 5 open tabs, while one of those is slack. And I did not see this behavior.

dspiatkowski commented 6 years ago

On Tue, 17 Apr 2018 12:33:06 +0000 (UTC) Silvan Scherrer wrote:

hmm does that happen always? or only if certain websites are open? I just tried it with 5 open tabs, while one of those is slack. And I did not see this behavior.

Curious indeed, but it does seem to partially match what I have been seeing here as well, I just never took this a step further like Lars did to try to understand what was going on with the threads.

Specifically, what I noticed is the following: when FF starts eating up the CPU cycles and it happens to be the active window, minimizing it sometimes will cause the CPU cycles to go away. I attributed that to there existings browser logic which "shuts off" whatever page refresh/processing may be happening while the window has the focus.

Now I did say "sometimes" because that does not always work. Oftentimes I will simply open an entirely blank NEW tab and then minimize the browser, which will usually cause the CPU spikes to drop.

So, at best, pretty inconsistent.

What is the expected behaviour when the following happens:

1) window is pushed to background 2) window is minimized

-Dariusz

StevenLevine commented 6 years ago

This behavior appears to be extension dependent. I have a number of test profiles and some exhibit the 100% utilization issue when minimized. I've not yet had time to track down which extension(s) trigger the behavior.

lerdmann commented 6 years ago

Then I can add that I only have 1 extension: "uBlock Origin".

dryeo commented 6 years ago

On 04/17/18 09:13 AM, lerdmann wrote:

uBlock Origin

I think that is one that triggers high CPU usage. Dave

lerdmann commented 6 years ago

Disabling "uBlock Origin" does not make a bit of a difference.

lerdmann commented 6 years ago

I cannot see any dependency on open websites either. Even with no website opened at all (only an empty page) I can see this odd behaviour of threads that continue to be "runnable" when Firefox is minimized.

lerdmann commented 6 years ago

Update: with only an empty tab open, only thread 1 of Firefox will be in "runnable" state on minimizing Firefox. Therefore I suspect that the second thread that used to be "runnable" on minimize deals with serving webpage content.

NeilWaldhauer commented 6 years ago

On another computer, I tried, again with the Mozilla Firefox Start Page. The CPU monitor on xCenter didn't show 100%, but TOP does show 100 % CPU for firefox, with only one thread at 100% and all others blocked. All extensions, Chatzilla, FireFTP, Password Exporter, UAControl are disabled.

lerdmann commented 6 years ago

If if minimize Firefox I can see in XCenter the 100 % CPU use "wandering" between the 8 cores that I have active. If I open TOP is can see at least thread 1 being in the "running" state.

dspiatkowski commented 6 years ago

On Sat, 21 Apr 2018 07:41:34 -0700 lerdmann wrote:

If if minimize Firefox I can see in XCenter the 100 % CPU use "wandering"
between the 8 cores that I have active. If I open TOP is can see at least thread 1 being in the "running" state.

For what it's worth, this behaviour seems to have gotten worse with the
latest GA1.1 release.

Not sure if there is any co-relation between the two, but my overall CPU
usage is now noticably higher. Basically, on a 5 core machine I am seeing a
consistent 15-20% average (over the past 1 min - as reported by the 'CPU
Monitor' utility) usage. When I view this in the 'MPCPU Mon' OS/2 utility,
with the refresh of 150 ms I see an on-going CPU spike taking place across
all cores.

Having said that, the GA1.1 drop overall, while feeling a tad slower, does
seem to be more stable. I noticed memory consumption seems to have slowed
down, and while the overall steady CPU utilization is higher it no longer
seems to increase as total runtime of the process increases.

In the past the CPU utilization used to slowly climb over the hours FF was
up and running, until after about 10-15 hrs (depending of course on the
amount of usage) the CPU spikes became so consistently large that I needed
to kill the process and re-start. Now, as far as the past 3 days of usage
indicate, that is no longer so pronounced.

StevenLevine commented 6 years ago

I captured a couple of process dumps with firefox GA 1.1 minimized and got some initial clues. PMDF tells us that thread 1 is in al call gate

Analyzing Call Gate ... is in a 32 bit call gate: 0148:00004ea8 OS2KRNL DOSSETFGND Called by: %1fbf5a14 PMMERGE CheckLoseBoost + 50 Past end of segment: 00bf:0000f8d4

This is not good because PMMerge should not be doing this kind of thing while minimize.

A very rough dump of the stack, using a somewhat out of sync .sym file claims

%findsym %0013f01c

Finding from %0013f01c to %0013f11c by dword 13f02c: 0ddb112f = xpc::JSReporter::CollectReports + 22A17AF 13f038: 0de83390 = xpc::JSReporter::CollectReports + 2373A10 13f03c: 0ded758b = xpc::JSReporter::CollectReports + 23C7C0B 13f048: 0ded7544 = xpc::JSReporter::CollectReports + 23C7BC4 13f05c: 0de8352b = xpc::JSReporter::CollectReports + 2373BAB 13f068: 0dd0af00 = xpc::JSReporter::CollectReports + 21FB580 13f06c: 0de833b0 = xpc::JSReporter::CollectReports + 2373A30

This might imply that the JS engine is not aware that firefox is running minimized. I will be able to provide a more useful stack trace once I have a copy of xul.map.

Note that the best way to debug this is probably going to be with kernel debugger. Minimize firefox. Break in. Select firefox thread one and step out of the kernel code and figure out what the ring3 code is doing. Once you have a rough idea of where this is, you probably can continue with the VAC debugger.

dmik commented 6 years ago

Yes it is very possible that we don't notify JS of various PM events properly (there have also been reports that some drag and drop doesn't work, see e.g. #267). I'm not sure this was ever done.

A copy of xul.map for GA1.1 is lost in subsequent test builds I'm afraid. You only have xul.dbg (in a respective .7z archive). Isn't it good enough?

StevenLevine commented 6 years ago

It may not be JS, but I'm still not done building up a usable .map file. Take a look at the attached PDUMPFF4.000.txt. I still need to build up the .map file a bit more, but the output is starting to make sense. The labels are mapsym compatible, so nsBaseDragService_QueryInterface_1 is really the 2nd nsBaseDragService::QueryInterface() method. I don't think I'm going to need to use .sym files for this, so I'll change the map back to reported demangled names.

I'm not quite sure what the stack content is telling us yet. However, it does give you a few functions and methods to compare the ff38, which does not have this issue.

I rebuilt the .map file with a combination of lxlite2map.cmd and khll.exe. This a actually somewhat better than just the map file because this picks up the non-global functions and methods too. You might want to consider packing up the .map files in the debuginfo rpms. The map files are required to work with pmdf or the kernel debugger.

StevenLevine commented 6 years ago

2018-04-27 SHL Slack profile 100% minimized

IBM OS/2 Dump Formatter for a retail or an hstrict UNI kernel. Formatter is --> Internal revision 14.106_W4 Dump file is --> Internal revision 14.106_W4 (process dump)

Symbol (d:\devtools\pmdf\14_106_w4_slamain\os2krnlr.sym) linked

File: memory.c - Line: 255 - Rc: 5 (GetLinearAddress failed in FormatMTE loading ulMTEAddr)

-> not loading 100% correctly - might be symbols

Current slot number: 0067

Slot Pid Ppid Csid Ord Sta Pri pTSD pPTDA pTCB Disp SG Name *0067# 1480 001b 1459 0001 blk 0200 f98f9000 f9d64c00 f9ca5a64 0ed0 1e

eax=00000000 ebx=0000efc0 ecx=00000007 edx=00000020 esi=0013f8e4 edi=21920000 eip=0000c511 esp=0017efb0 ebp=0013f8d4 iopl=2 -- -- -- nv up ei ng nz na po nc cs=ffef ss=00bf ds=0053 es=0053 fs=150b gs=0000 cr2=00000000 cr3=00000000 ffef:0000c511 66ea6477fc1f5b00 jmp 005b:1ffc7764

.pu

Slot Pid Ord pPTDA Name pstkframe CS:EIP SS:ESP cbargs *0067# 1480 0001 f9d64c00 %f98f9f4c ffef:0000c511 00bf:0017efb0 0004

Analyzing Call Gate ... is in a 32 bit call gate: 0148:00004ea8 OS2KRNL DOSSETFGND Called by: %1fbf5a14 PMMERGE CheckLoseBoost + 50 Past end of segment: 00bf:0000f8d4

%thunktib

%0017efc8 TS_Signature ThunkTIB %0017efd0 savedESP 0013f01c %0017efd4 haveSaved 01 %0017efd5 usingSysESP 01 %0017efd6 fill 0000 %0017efd8 savedPStack 00040000 %0017efdc savedPStackLimit 00140000 %0017efe0 TS_NextStack; 00000000 %0017efe4 TS_ThreadId 00000001 %0017efe8 TS_ThunkDepth 00000001 %0017efec TS_RogueDepth 00000000 %0017eff0 TS_SystemESP 0017efc8 %0017eff4 TS_UserESP 0013f8b0 %0017eff8 TS_UserStack 00040000 %0017effc TS_UserLimit 00140000

%findsym %0013f01c l1000

Finding from %0013f01c to %0014001c by dword 13f04c: 0dcfe7bc = js_NewObjectOperation + 922C 13f0bc: 0b03ec1a = NS_IsMainThread + 10 13f0cc: 0d1981fd = nsBaseDragService_QueryInterface_1 + 1C0 13f0dc: 0d1a6b80 = nsIdleServiceDaily_Observe + 37C 13f0ec: 0b06022b = nsCOMPtr_base_assign_from_qi + 27 13f13c: 0c70f6d4 = nsISupports_COMTypeInfo_kIID + 846C 13f16c: 0b03ec1a = NS_IsMainThread + 10 13f2d0: 1ffc83eb = _UnpackFEA + ff 13f30c: 1d35db4b = libc_back_fsPathCalcInodeAndDev + 5f 13f33c: 1d35d539 = libc_back_fsUnixAttribsGet + 12d 13f360: 1d35b8e0 = libc_gFsUnixAttribsGEA2List + 10 13f3dc: 1d2efa74 = _std_strpbrk + 34 13f45c: 1ffc3fd0 = _FFNAlign + 15c 13f4b0: 1ffc3db8 = DOS32FINDFIRST + 11c 13f4dc: 1d29ae76 = _um_lump_alloc + 31e 13f52c: 1d29ab36 = _um_alloc_no_lock + 96 13f55c: 1d2f8bb4 = _um_realloc + 1c0 13f58c: 1d2f8a98 = _um_realloc + a4 13f5cc: 1d2e86c4 = _std_realloc + 34 13f5ec: 0da4708a = nsBaseDragService + 89F6AA 13f60c: 0af8ebed = text + EBED 13f61c: 1d29ab0a = _um_alloc_no_lock + 6a 13f63c: 0af9d305 = text + 1D305 13f65c: 0b063055 = PLDHashTable_Add + 1791 13f67c: 1d2f8342 = _um_free_maybe_lock + c2 13f6bc: 0af9d305 = text + 1D305 13f6cc: 0af8fe3b = text + FE3B 13f6dc: 0be6f560 = mozilla_layers_AsyncPanZoomController_NotifyMozMouseScrollEvent + 30E194 13f70c: 0bd4d3e2 = mozilla_layers_AsyncPanZoomController_NotifyMozMouseScrollEvent + 1EC016 13f74c: 0d39f1e5 = nsBaseDragService + 1F7805 13f77c: 0d9bc932 = nsBaseDragService + 814F52 13f78c: 0d9baa7a = nsBaseDragService + 81309A 13f7ac: 0d9bc347 = nsBaseDragService + 814967 13f7cc: 0d37145d = nsBaseDragService + 1C9A7D 13f80c: 1d29a9ba = _umalloc + c2 13f81c: 1d2f811d = _um_crumb_free_maybe_lock + e5 13f84c: 1d2f83c6 = _um_free_maybe_lock + 146 13f87c: 1d2e0881 = _std_free + 21 13f89c: 1fbfd3f7 = ReadMessage + 87 13f8a8: 1fbfd4f2 = ReadMessage + 182 13f8ac: 1ffc76a4 = UT32_ExceptionSet + 5 13f8d8: 1fbf5a64 = CheckLoseBoost + 50 13f8e8: 1fbf3b78 = WIN32PEEKMSG + b94 13f934: 0d373027 = nsBaseDragService + 1CB647 13f96c: 0d1b0144 = nsBaseDragService + 8764 13f9bc: 0d19bf8b = nsBaseDragService_QueryInterface_1 + 3F4E 13f9dc: 0b95e700 = PLDHashTable_Add + 8FCE3C 13f9fc: 0b043e4d = nsThreadPool_Dispatch + 3E97 13fa30: 0affe1c8 = __text + 7E1C8 13fa5c: 0b06e2c4 = PLDHashTable_Add + CA00 13fa6c: 0b0612c9 = nsCOMPtr_base_assign_from_helper + 23 13fa7c: 0b04a133 = nsThreadPool_Dispatch + A17D 13fa9c: 0b3d2770 = PLDHashTable_Add + 370EAC 13faa8: 0affe1c8 = text + 7E1C8 13faac: 0b00c51c = text + 8C51C 13fab0: 0b06b158 = PLDHashTable_Add + 9894 13fadc: 0b37fd27 = PLDHashTable_Add + 31E463 13faec: 0b06e466 = PLDHashTable_Add + CBA2 13fb18: 0b002696 = text + 82696 13fb1c: 0d1983d7 = nsBaseDragService_QueryInterface_1 + 39A 13fb2c: 0da1ce9b = nsBaseDragService + 8754BB 13fb3c: 0d9b1732 = nsBaseDragService + 809D52 13fb44: 0d898b51 = nsBaseDragService + 6F1171 13fb5c: 0da1a4db = nsBaseDragService + 872AFB 13fb68: 0da19fcf = nsBaseDragService + 8725EF 13fbe0: 0da194ac = nsBaseDragService + 871ACC 13fc0c: 1d2a0704 = _std_malloc + 34 13fc2c: 0da15651 = nsBaseDragService + 86DC71 13fc4c: 0da1ac89 = nsBaseDragService + 8732A9 13fc9c: 0da1aeaa = nsBaseDragService + 8734CA 13fcd4: 0da241e8 = nsBaseDragService + 87C808 13fcd8: 0da2420c = nsBaseDragService + 87C82C 13fd2c: 0b010eac = text + 90EAC 13fd88: 0da559a0 = nsBaseDragService + 8ADFC0 13fd9c: 0afd8200 = text + 58200 13fddc: 0b010e2a = text + 90E2A 13fe9c: 1d2dbfbf = libc_atexit_new + 3f 13ff08: 1853931f = _get_proc_desc_ex + 27f 13ff40: 1853947d = _get_proc_desc_ex + 3dd 13ff68: 18539180 = _get_proc_desc_ex + e0 13ff70: 15571270 = MYHANDLER 13ff74: 1d2a1304 = init + d4 13ff80: 185394f1 = _get_proc_desc_ex + 451 13ff98: 1d2d6e50 = _libc_Back_exceptionHandler + 10 13ffd4: 1d2a1292 = init + 62 13ffe4: 1d2b89ab = _sys_init_ret + f Scan stopped at 14009c stopaddr 14001c

dmik commented 6 years ago

Re map files, yes, good idea but as you say generating maps out of HLL results in something better, we should automate this process during debuginfo rpm creation I suppose. Or better provide an RPM with a tool that will create a .map out of .dbg. This will be the best solution.

Thanks for the trace, it's really interesting that there are D&D traces here too.

StevenLevine commented 6 years ago

Automating map file generation from the hll data might be useful. If we are going to do this we can start with Knut's khll.exe (Odin's tools\debuginfo). It already has most of the required parsing logic. Keep mind that some of the symbols listed are stale. They are not on the call path. We will not know the actual call path until I finish tweaking the .map file and locate the call path on the stack. I need to work with a tweaked map file because the full map file has something like 250K symbols, as you know. This is too much for my current pmdf REXX code to handle efficiently. It was never designed for map files of this size. I have some ideas that should speed up the search logic assume REXX does not run out of memory. I do think that investigating nsIdleServiceDaily::Observe + 37C is doing might be useful.

StevenLevine commented 6 years ago

After some pruning this is what's really on the stack when firefox is miniimized. We are in a call gate

Analyzing Call Gate ... is in a 32 bit call gate: 0148:00004ea8 OS2KRNL DOSSETFGND Called by: %1fbf5a14 PMMERGE CheckLoseBoost + 50

and the call stack is

13f8d8: 1fbf5a64 = CheckLoseBoost + 50 -> BoostPmq 13f8e8: 1fbf3b78 = WIN32PEEKMSG + b94 13f96c: 0d1b0144 = nsAppShell::ProcessNextNativeEvent + 40 -> call Win32PeekMsg 13f9bc: 0d19bf8b = nsBaseAppShell::OnProcessNextEvent + 161 -> call dword ptr [eax+44] nsAppShell::ProcessNextNativeEvent 13f9fc: 0b043e4d = nsThread::ProcessNextEvent + 423 -> call eax nsBaseAppShell::OnProcessNextEvent 13fa5c: 0b06e2c4 = NS_ProcessNextEvent + 27 -> call dword ptr [edx+24] nsThread::ProcessNextEvent 13fa9c: 0b3d2770 = mozilla::ipc::MessagePump::Run + 68 -> call NS_ProcessNextEvent 13fadc: 0b37fd27 = MessageLoop::Run + 3F -> call dword ptr [ecx] mozilla::ipc::MessagePump::Run 13fb1c: 0d1983d7 = nsBaseAppShell::Run + 27 -> call MessageLoop::Run 13fb3c: 0d9b1732 = nsAppStartup::Run + 46 -> call dword ptr [edx+0c] nsBaseAppShell::Run 13fb5c: 0da1a4db = XREMain::XRE_mainRun + 4D3 -> call dword ptr [edx+14] nsAppStartup::Run 13fc4c: 0da1ac89 = XREMain::XRE_main + 169 -> call XREMain::XRE_mainRun 13fc9c: 0da1aeaa = _XRE_main + F6 -> call XREMain::XRE_main 00012639 = firefox.exe 1:2639 -> call _XRE_main

When firefox is visible on the desktop and we have normal utilization, we are in a better call gate and we have

Analyzing Call Gate ... is in a call to: %1ffc0998 DOSCALL1 DOS32PMWAITEVENTSEM

The approximate call stack si:

13f930: 1fc42521 = WIN32WAITMSG + 71 13f96c: 0973022d = nsAppShell::ProcessNextNativeEvent + 129 13f9bc: 0971bf0a = nsBaseAppShell::OnProcessNextEvent + E0 13f9fc: 075c3e4d = nsThread::ProcessNextEvent + 423 13fa5c: 075ee2c4 = NS_ProcessNextEvent + 27

So for some reason the code path when minimized differs and we end up peeking for messages rather than sleeping.

dryeo commented 6 years ago

This implies the problem is somewhere in browser/, possibly browser/app/nsBrowserApp.cpp. Neither Thunderbird or SeaMonkey (SeaMonkey really needs testing with a new profile as I have a hundred or tabs open) show this behaviour with Thunderbird here showing less then 1% CPU usage on one core when minimized with the odd peak every minute or more. nsBrowserApp.cpp has had more code added over the years then nsMailApp.cpp (about1.2Kb) though they started out very similar and even now I can easily apply Dmiks patches to both nsMailApp.cpp and nsSuiteApp.cpp with just some simple renaming of variables.

dryeo commented 6 years ago

Here's a diff after removing most of the Windows and MacOS additions. Seems the culprit might be

+  mozilla::TimeStamp start = mozilla::TimeStamp::Now();

--- C:\work\cc45esr\mail\app\nsMailApp.cpp  2017-10-14 20:19:48.000000000 
-0700
+++ C:\work\cc45esr\mozilla\browser\app\nsBrowserApp.cpp    2017-11-15 
23:32:56.000000000 -0800

@@ -80,7 +90,7 @@
        hmq = WinCreateMsgQueue(hab, 0);
      }
      if (hmq != NULLHANDLE) {
-      WinMessageBox(HWND_DESKTOP, 0, msg, "Thunderbird", 0,
+      WinMessageBox(HWND_DESKTOP, 0, msg, "Firefox", 0,
                      MB_OK | MB_ERROR | MB_MOVEABLE);
      }
    }

@@ -130,7 +140,7 @@
      return !strcasecmp(arg, s);
    }

-#if defined(XP_WIN)
+#if defined(XP_WIN) || defined(XP_OS2)
    if (*arg == '/')
      return !strcasecmp(++arg, s);
  #endif
@@ -142,21 +152,23 @@
  XRE_CreateAppDataType XRE_CreateAppData;
  XRE_FreeAppDataType XRE_FreeAppData;
  XRE_TelemetryAccumulateType XRE_TelemetryAccumulate;
+XRE_StartupTimelineRecordType XRE_StartupTimelineRecord;
  XRE_mainType XRE_main;
+XRE_StopLateWriteChecksType XRE_StopLateWriteChecks;

  static const nsDynamicFunctionLoad kXULFuncs[] = {
      { "XRE_GetFileFromPath", (NSFuncPtr*) &XRE_GetFileFromPath },
      { "XRE_CreateAppData", (NSFuncPtr*) &XRE_CreateAppData },
      { "XRE_FreeAppData", (NSFuncPtr*) &XRE_FreeAppData },
      { "XRE_TelemetryAccumulate", (NSFuncPtr*) &XRE_TelemetryAccumulate },
+    { "XRE_StartupTimelineRecord", (NSFuncPtr*) 
&XRE_StartupTimelineRecord },
      { "XRE_main", (NSFuncPtr*) &XRE_main },
+    { "XRE_StopLateWriteChecks", (NSFuncPtr*) &XRE_StopLateWriteChecks },
      { nullptr, nullptr }
  };

  static int do_main(int argc, char* argv[], nsIFile *xreDirectory, bool 
greIsXre)
  {
-  NS_LogInit();
-
    nsCOMPtr<nsIFile> appini;
    nsresult rv;
    uint32_t mainFlags = 0;
@@ -194,7 +206,6 @@
      argc -= 2;
    }

-  int result;
    if (appini) {
      nsXREAppData *appData;
      rv = XRE_CreateAppData(appini, &appData);
@@ -204,18 +215,20 @@
      }
      // xreDirectory already has a refcount from NS_NewLocalFile
      appData->xreDirectory = xreDirectory;
-    result = XRE_main(argc, argv, appData, 0);
+    int result = XRE_main(argc, argv, appData, mainFlags);
      XRE_FreeAppData(appData);
-  } else {
-    ScopedAppData appData(&sAppData);
-    nsCOMPtr<nsIFile> exeFile;
-    rv = mozilla::BinaryPath::GetFile(argv[0], getter_AddRefs(exeFile));
-    if (NS_FAILED(rv)) {
-      Output("Couldn't find the application directory.\n");
-      return 255;
-    }
+    return result;
+  }

-    nsCOMPtr<nsIFile> greDir;
+  ScopedAppData appData(&sAppData);
+  nsCOMPtr<nsIFile> exeFile;
+  rv = mozilla::BinaryPath::GetFile(argv[0], getter_AddRefs(exeFile));
+  if (NS_FAILED(rv)) {
+    Output("Couldn't find the application directory.\n");
+    return 255;
+  }
+
+  nsCOMPtr<nsIFile> greDir;
    if (greIsXre) {
      greDir = xreDirectory;
    } else {
@@ -250,6 +265,7 @@
  InitXPCOMGlue(const char *argv0, nsIFile **xreDirectory, bool *greIsXre)
  {
    char exePath[MAXPATHLEN];
+  char errBuf[MAXPATHLEN * 2];

    *greIsXre = false;

@@ -260,7 +276,8 @@
    }

    char *lastSlash = strrchr(exePath, XPCOM_FILE_PATH_SEPARATOR[0]);
-  if (!lastSlash || (size_t(lastSlash - exePath) > MAXPATHLEN - 
sizeof(XPCOM_DLL)))
+  if (!lastSlash || (size_t(lastSlash - exePath) > MAXPATHLEN -
+sizeof(XPCOM_DLL)))
      return NS_ERROR_FAILURE;

    strcpy(lastSlash + 1, XPCOM_DLL);
@@ -269,16 +286,16 @@
  #ifdef XP_OS2
      // If no runtime exists in the launcher's directory, we check if it is
      // usr/bin (e.g. an RPM installation) and search for the runtime in
-    // usr/lib/Thunderbird-XYZ.
+    // usr/lib/firefox-XYZ.
      bool ok = false;
      const char UsrBin[] = "\\usr\\bin";
-    const char ThunderbirdXYZ[] = "lib\\Thunderbird-" MOZ_APP_UA_VERSION;
+    const char FirefoxXYZ[] = "lib\\firefox-" MOZILLA_VERSION;
      size_t len = lastSlash - exePath;
-    if (MAXPATHLEN - len - sizeof(XPCOM_DLL) >= sizeof(ThunderbirdXYZ) 
- 4 /* lib\\ */) {
+    if (MAXPATHLEN - len - sizeof(XPCOM_DLL) >= sizeof(FirefoxXYZ) - 4 
/* lib\\ */) {
        if (len > sizeof(UsrBin) - 1 /* \0 */ &&
            !strnicmp(lastSlash - sizeof(UsrBin) + 1, UsrBin, 
sizeof(UsrBin) - 1)) {
-        memcpy(lastSlash - 3 /* bin */, ThunderbirdXYZ, 
sizeof(ThunderbirdXYZ) - 1);
-        lastSlash += -3 + sizeof(ThunderbirdXYZ) - 1;
+        memcpy(lastSlash - 3 /* bin */, FirefoxXYZ, sizeof(FirefoxXYZ) 
- 1);
+        lastSlash += -3 + sizeof(FirefoxXYZ) - 1;
          *lastSlash = XPCOM_FILE_PATH_SEPARATOR[0];
          strcpy(lastSlash + 1, XPCOM_DLL);
          ok = FileExists(exePath);
@@ -320,9 +337,12 @@
    // We do this because of data in bug 771745
    XPCOMGlueEnablePreload();

-  rv = XPCOMGlueStartup(exePath);
+  rv = XPCOMGlueStartup2(exePath, errBuf, sizeof(errBuf));
    if (NS_FAILED(rv)) {
-    Output("Couldn't load XPCOM.\n");
+    if (errBuf[0])
+      Output("Couldn't load %s (%s).\n", XPCOM_DLL, errBuf);
+    else
+      Output("Couldn't load %s.\n", XPCOM_DLL);
      return rv;
    }

@@ -332,6 +352,7 @@
      return rv;
    }

+  // This will set this thread as the main thread.
    NS_LogInit();

    // chop XPCOM_DLL off exePath
@@ -353,6 +374,8 @@

  int main(int argc, char* argv[])
  {
+  mozilla::TimeStamp start = mozilla::TimeStamp::Now();
+
  #ifdef XP_MACOSX
    TriggerQuirks();
  #endif
@@ -362,16 +385,13 @@
    struct rusage initialRUsage;
    gotCounters = !getrusage(RUSAGE_SELF, &initialRUsage);
  #elif defined(XP_WIN)
-  // GetProcessIoCounters().ReadOperationCount seems to have little to
-  // do with actual read operations. It reports 0 or 1 at this stage
-  // in the program. Luckily 1 coincides with when prefetch is
-  // enabled. If Windows prefetch didn't happen we can do our own
-  // faster dll preloading.
    IO_COUNTERS ioCounters;
    gotCounters = GetProcessIoCounters(GetCurrentProcess(), &ioCounters);
  #elif defined(XP_OS2)
    // no counters at the moment
    gotCounters = 0;
+#else
+  #error "Unknown platform"  // having this here keeps cppcheck happy
  #endif

    nsIFile *xreDirectory;
@@ -394,6 +414,8 @@
      return 255;
    }

+  XRE_StartupTimelineRecord(mozilla::StartupTimeline::START, start);
+
    if (gotCounters) {
  #if defined(XP_WIN)

XRE_TelemetryAccumulate(mozilla::Telemetry::EARLY_GLUESTARTUP_READ_OPS,
@@ -415,6 +437,10 @@
        XRE_TelemetryAccumulate(mozilla::Telemetry::GLUESTARTUP_HARD_FAULTS,
                                int(newRUsage.ru_majflt - 
initialRUsage.ru_majflt));
      }
+#elif defined(XP_OS2)
+    // nothing to do at the moment
+#else
+  #error "Unknown platform"  // having this here keeps cppcheck happy
  #endif
    }

dryeo commented 6 years ago

I'll apply the TimeStamp stuff if I can and rebuild Thunderbird.

dryeo commented 6 years ago

Removing the timer stuff from nsBrowserApp.cpp and rebuilding Firefox didn't help. Also tested SM with a new profile, much the same results. Thunderbird, with the thunderbrowse extension and a couple of tabs displaying web pages still showed little activity when minimized.

lerdmann commented 6 years ago

I wonder if this problem (WinPeekMsg being called instead of WinWaitMsg being called) is also the cause of videos not being played until the end. Because the effect is the same: the video starts playing, all of a sudden it will stop somewhere in the middle and when I then look at TOP, I can see thread 1 being in the "running" state. Alternatively, the video will play all the way through but if I restart it, it will not start to play and TOP shows the very same effect.

lerdmann commented 6 years ago

Maybe the source code needs to be searched for "WinPeekMsg" and then replace the tight polling with "WinPeekMsg" (because that is what it looks like) with the combination of "WinWaitMsg" and "WinPeekMsg".

lerdmann commented 6 years ago

Proposition: change widget/os2/nsAppShell.cpp, "nsAppShell::ProcessNextNativeEvent(bool mayWait)" to:

bool nsAppShell::ProcessNextNativeEvent(bool mayWait) { bool gotMessage = false; HAB hab = ::WinQueryAnchorBlock(HWND_DESKTOP); <----- think it would be a good idea ...

do { QMSG qmsg; // Give priority to system messages (in particular keyboard, mouse, timer, // and paint messages). if (WinPeekMsg(hab, &qmsg, NULLHANDLE, WM_CHAR, WM_VIOCHAR, PM_REMOVE) || WinPeekMsg(hab, &qmsg, NULLHANDLE, WM_MOUSEFIRST, WM_MOUSELAST, PM_REMOVE) || WinPeekMsg(hab, &qmsg, NULLHANDLE, 0, WM_USER-1, PM_REMOVE) || WinPeekMsg(hab, &qmsg, NULLHANDLE, 0, 0, PM_REMOVE)) { gotMessage = true; ::WinDispatchMsg(hab, &qmsg); } else if (mayWait) { // Block and wait for any posted application message ::WinWaitMsg(hab, 0, 0); } else { ::DosSleep(1); <------------------------ give away timeslice for threads of any prio (higher or lower) } } while (!gotMessage && mayWait);

dmik commented 6 years ago

@lerdmann why don't you like WinPeekMsg? Changing this piece of code will not change anything in terms of how "tightly" messages are handled — if (and as long as) there are messages in the queue, WinWaitMsg will return immediately, just like WinPeekMsg. If WinPeekMsg returns FALSE (i.e. no matching messages), WinWaitMsg will be called if the calling code is ready to block (i.e. mayWait = TRUE). WinPeekMsg is used there for one purpose: guarantee some specific priority of message processing according to the application logic. In other words, the code fragment your are referring to is specifically designed this way.

I also doubt that DosSleep(1) is needed where WinPeekMsg and WinWaitMsg are used as I'm pretty sure some sort of time slice yielding is done internally in these functions anyway. And the usefulness of giving up additional time slice is questionable per se. It's not good to make the main event thread starve if there are lots of messages to process — it will only make things worse.

Regarding WinInitialize(0) vs WinQueryAnchorBlock(HWND_DESKTOP) — this makes no difference on OS/2 — both achieve the same result. More over, you may freely pass NULLHANDLE wherever you need a hab. PM has only one desktop and one anchor block. IIRC, this API was designed with some multi-desktops in mind (AS400 and friends?) and is completely irrelevant to OS/2.

lerdmann commented 6 years ago

My suggested implementation WILL make a difference if nsAppShell::ProcessNextNativeEvent(bool mayWait) itself is called in a tight loop or called very often. The obvious problem is that the main thread is completely occupied running this function if FF is minimized. That indicates that WinPeekMsg is called over and over again. The other solution is to call nsAppShell::ProcessNextNativeEvent(bool mayWait) always with mayWait = TRUE but I guess that is not desirable for some reason.

dmik commented 6 years ago

If it's like that, then something is wrong in design on the other side. We should fix that other side (i.e. not let the code call ProcessNextNativeEvent that often with mayWait=FALSE) rather than work around this problem on our side in a not clean way.

E.g. regarding the minimized state, if it's like you say, then the calling code should detect this state and stop issuing so many calls in this state (like stop issuing full state paint requests that just endlessly pollute the PM message queue, for instance). I'm pretty sure other platforms already solve it in a similar way. Still, it's the other side we should fix, not ProcessNextNativeEvent.

lerdmann commented 6 years ago

Ok, did not look any further. I we look up the stack chain then maybe "NS_ProcessNextEvent" is the right candidate. Or "nsThread::ProcessNextEvent". Or "nsBaseAppShell::OnProcessNextEvent".

dryeo commented 6 years ago

Looking further at SeaMonkey, with a new profile and in safe mode, CPU usage is low when minimized. Probably the problem with SM is Chatzilla, which seems to cause high CPU usage lately for some reason. Using the same XPI with FF is fine. Considering most of the code is the same, it is weird that FF only shows the high CPU usage when minimized.

StevenLevine commented 6 years ago

I agree with @dmik. WinPeekMsg vs. WinWaitMsg is a symptom of the problem, not the problem itself. Attempting implement a point fix to code that used to work is most likely a waste of time.

I need to verify this, but I suspect that this code has been around for a while and since the pristine Mozilla code base has no OS/2 code, this code has probably been part of our port for a while and used to work. The question should be what changed between 38 and 45 that caused the WinPeekMsg path to be taken. It could be something as simple as the semantics of mayWait changing, but we will not know without some more analysis.

I have not yet captured a process dump of a minimized 38 to understand which code path it is using. I should have time to do this later today.

StevenLevine commented 6 years ago

We now know to FF38 is in WinWaitMsg when minimized, as it should be.

As I read the nsAppShell::ProcessNextNativeEvent() code the likely reason that FF45 is not in WinWaitMsg is because mayWait is false. This would allow some code outside nsAppShell::ProcessNextNativeEvent to loop and repeatedly call nsAppShell::ProcessNextNativeEvent and result in the behavior we see.

To validate this idea, someone with a debug setup needs check the value of mayWait when the browser is minimized.

dspiatkowski commented 6 years ago

@StevenLevine What is needed to have this debug setup? I have the official GA1.1 and dimik's T2 debug installs here (matching *.dbg files), but nothing beyond this...is this sufficient? If that is OK what do I need to do to trap the value of mayWait?

dmik commented 6 years ago

@StevenLevine I have the same guess. I will surely look at lt all — it's in my to do list also because it's another way to investigate the unresponsive JS problem. I'll just finish with #264 first as my debug XUL is in an unbuildable state at the moment.

dryeo commented 6 years ago

On 04/30/18 03:08 PM, StevenLevine wrote:

To validate this idea, someone with a debug setup needs check the value of mayWait when the browser is minimized.

Not sure I'm doing it right but I get

mayWait Expression evaluation failed.

with the only variable being evaluated out of 10 being

gotMessage 0 / false /

If I step into the function, I end up in pmmerge assembly.

Most of the time if I halt program execution while minimized, I'm in some timer stuff. This is with idbug

StevenLevine commented 6 years ago

@dspiatkowski, to debug mozilla code you need a working VAC 3.65 debugger setup. You probably need to use remote debugging. There probably just to much GUI activity going on for local debugging, even when running idebug in sync mode.

I thought @dmik did a write up on debugger setup for firefox debugging, but maybe I just imagined this.

@dryeo, idebug might not be up to the job to displaying this kind of value. mayWait is passed on the stack and stashed in esi while the code is running.

Since we have:

13f96c: 0d1b0144 = nsAppShell::ProcessNextNativeEvent + 40

disassembling shows us:

u %0d1b0144-40

%0d1b0104 56 push esi %0d1b0105 53 push ebx %0d1b0106 83ec44 sub esp,+44 ;'D' %0d1b0109 0fb6442454 movzx eax,byte ptr [esp+54] %0d1b010e 89c6 mov esi,eax

We can do a bit of math and look at the stack on entry:

dd %13f96c+4+8+44

%0013f9bc 0d19bf8b 20243840 00000000 00000000

Annotating gives us:

                eip           pObj         mayWait

%0013f9bc 0d19bf8b 20243840 00000000

which tells us that mayWait is passed as 0. We need to look at the call path and figure out why. I recommend not stepping into pmmerge code. You are likely to lock up the system.

One way (of many) to see the value of mayWait in the debugger, is to break at the function entry, switch to assembly mode and step until eax is loaded.

What's next is for everyone to look at the call path and figure out why mayWait is being passed as zero. Comparing the FF38 code to the FF45 code is probably going to provide the explanation.

dryeo commented 6 years ago

@StevenLevine idbug doesn't seem capable of loading the call stack, complaining about a corrupted stack pointer or such.

lerdmann commented 6 years ago

I am not surprised. Unless you translate with no optimization at all, gcc is going to avoid to set up a proper stack frame. For performance reasons, of course.

StevenLevine commented 6 years ago

@dryeo, idbug is based on the VAC 3.08 debugger, so it does not do some things as well as idebug, which is from the much newer VAC 3.65. Idbug's JIT debugging is handy when it works.

@lerdmann, the inability to dump the stack from has little to do with the optimization settings. The OS/2 debuggers are something like 20 years and and simply missing the required code. Everything the debugger needs is in the debug data. Other debuggers, such as wdw, do a better job of this. Unfortunately, wdw does not yet support HLL debug data, but it will happen once I carve out the time to focus.

That said, the generated function prologues are very consistent. Some pushes and and a sub esp, so it's trivial to calculate the location of the the return address and the parameters on the stack.

lerdmann commented 6 years ago

Then I am interested in learning how to follow the call stack if you don't have a standard stack frame. How is the debugger supposed to know how to follow the call chain if the invocation follows no known and agreed and standard way ? The debug symbols will only help in showing correct routine names and variable names. Anyway, I'd be more interested in a result rather than a "I am right and you are wrong" discussion.

dryeo commented 6 years ago

@lerdman, I thought about the optimization thing after going to bed and then today when I tried to build a non-optimized version, I of course ran into the lack of atomics on the i386, so I'll rebuild again targeting -pentium-m -Og.

@StevenLevine is it worth my installing the VAC 3.65 debugger? Does it understand SSE?

StevenLevine commented 6 years ago

@dryeo, the 3.65 docs say nothing specific about SSE. The major benefit of idebug over icsdebug or idbug is better stability when debugging GUI apps and a different set of defects.

dryeo commented 6 years ago

OK, after fixing the operator error, I've compiled Firefox with -Og (optimized for debugging) and stopped execution at the if (WinPeekMsg() line in nsAppShell::ScheduleNativeEventCallBack() and looked at the local variables. It shows two instances of mayWait, first is "Expression evaluation failed." and the second is 0 / false /. Still no call stack due to bad call stack pointer.

Couldn't find VACPP 3.65, did find 4.02 but I take it that the debugger in that isn't worth using.

lerdmann commented 6 years ago

Develper Connection Volume 12: http://83.133.184.251/winworldpc.com/download/3D14E128-18DA-11E4-99E5-7054D21A8599.html

I am not sure if it contains a demo version of Visual Age C++ 3.6.5 for OS/2. I would have to check.

dspiatkowski commented 6 years ago

@dryeo Take a look at the attached ZIP of what's found on DevCon12, I see a bunch of IBMCPP 3.0 references... DevCon12_contents.zip

dryeo commented 6 years ago

On 05/02/18 11:48 PM, lerdmann wrote:

Develper Connection Volume 12: http://83.133.184.251/winworldpc.com/download/3D14E128-18DA-11E4-99E5-7054D21A8599.html

I tried that, it wouldn't unzip (un7z). Perhaps a bad download.

StevenLevine commented 6 years ago

@dryeo. I recomment not wasting a lot of time with the suggested VAC 3.0 download. It's circa 1996 and given the age of DevCon V12, obviously described as:

'VisualAge C++ for OS/2, Version 3.0 (IBM)' is located on disc 4.

FWIW, I would not spend a lot more time trying to use the IBM debugger to check the value of mayWait at the source level, for a couple of reasons. We already know that when that the 100% utilization indicates it has the value 0. The debugger testing done so far. pretty much proves that the IBM debuggers do not support the stack layout we use when building Firefox. Obviously, this would not be the case if we were building for Linux, on Linux and using Linux debugging tools, but we are not.

The workaround I recommend is learning enough about the stack layout so that you can use the IBM debugger in assembly mode to view the parameter values, when needed. https://github.com/bitwiseworks/mozilla-os2/issues/265#issuecomment-385594042 demonstrates the basic procdeure. To expand the explanation for @lerdmann. we have

dd %13f96c+4+8+44

where %13f96c is the current stack pointer. 4 is the size of the return address on the stack. 8 is the size of the two doublewords and 44 is the number of bytes (in hex) used by the automatic variables. The resulting value points to the 1st argument that pushed onto the stack. Since this is C++ and the method is non-static, the first argument is the so called "this" pointer which points to the object instance for the method. So, mayWait is the second parameter passed to the method.

There is nothing random or undefined about this stack prologue. It will be understood by every debugger that understands the gcc code generation options.

Another FWIW, if need to do this kind a stack analysis a couple more times, I will probably implement some PMDF REXX code to automate the process. It will require a dump file that includes the shared code because the script will need to count the pushs and determine the value subtracted from the stack pointer. Of course, this is no different from what I need when I do the math by hand.

A final FWIW. The focus at this time should be on understanding what changed in FF45 that caused nsAppShell::ProcessNextNativeEvent to be called with mayWait = 0. I was hoping that someone else would put the time in figure this out by now. However, the ticket has seemed to wander off on tangents. This changed that caused the behavior is probably in the caller to OnProcessNextEvent().

dmik commented 6 years ago

I did some debugging and can confirm that mayWait is never FALSE in minimized state. However, it's not always like that. E.g. if you get a session restore screen and minimize before restoring the session, messages are processed normally — it waits sometimes and no 100% CPU load. At least we may be sure that the CPU overload is related to continuous message polling.

I may also notice that polling (i.e. mayWait = FALSE) is always the case when there is high CPU load (which is kinda expected). Need to further trace its logic.

bitwiseworks / mozilla-os2

CPU usage is higher when Firefox is minimized #265

%findsym %0013f01c

.pu

%thunktib

%findsym %0013f01c l1000

u %0d1b0144-40

dd %13f96c+4+8+44