Justintime50 / harvey

The lightweight Docker Compose deployment runner.
https://github.com/Justintime50/harvey-ui
MIT License
13 stars 3 forks source link

Deployments Are Getting Stuck Without Timing Out - SegFault #72

Closed Justintime50 closed 1 year ago

Justintime50 commented 1 year ago

There is some race condition happening where deployments get stuck after pulling in changes before deploying. Subprocess operations have a timeout set on them but it doesn't appear that it's either taking effect or correctly exiting after. I have multiple deployments that have started but never stopped nor errored out, just stuck in "in-progress".

Justintime50 commented 1 year ago

I thought I fixed this via https://github.com/Justintime50/harvey/commit/53c097c326b4a308c423755263bb36f9f7633d5a#diff-56e74ba8155db5498d52b15ac4600497a1f11c57a2e91c9af11f6ed5b7137b08R111, but it appears that builds can still get stuck randomly. I've traced the code and am unsure why this is happening because they should continue to run just fine.

Justintime50 commented 1 year ago

Aha, FINALLY found where the problem was occurring, not necessarily what the problem is:

Pulling Harvey config for justintime50/os-scripting from webhook...
!!! uWSGI process 76997 got Segmentation Fault !!!
*** backtrace of 76997 ***
0   uwsgi                               0x0000000102a2a09c uwsgi_backtrace + 52
1   uwsgi                               0x0000000102a2a5b0 uwsgi_segfault + 56
2   libsystem_platform.dylib            0x00000001843802a4 _sigtramp + 56
3   libdispatch.dylib                   0x00000001841df900 _dispatch_apply_with_attr_f + 1096
4   libdispatch.dylib                   0x00000001841dfb48 dispatch_apply + 108
5   CoreFoundation                      0x0000000184557eb4 __103-[CFPrefsSearchListSource synchronouslySendSystemMessage:andUserMessage:andDirectMessage:replyHandler:]_block_invoke.52 + 132
6   CoreFoundation                      0x00000001843e7a40 CFPREFERENCES_IS_WAITING_FOR_SYSTEM_AND_USER_CFPREFSDS + 100
7   CoreFoundation                      0x00000001845570e4 -[CFPrefsSearchListSource synchronouslySendSystemMessage:andUserMessage:andDirectMessage:replyHandler:] + 232
8   CoreFoundation                      0x00000001843e6160 -[CFPrefsSearchListSource alreadylocked_generationCountFromListOfSources:count:] + 232
9   CoreFoundation                      0x00000001843e5e6c -[CFPrefsSearchListSource alreadylocked_getDictionary:] + 468
10  CoreFoundation                      0x00000001843e59f0 -[CFPrefsSearchListSource alreadylocked_copyValueForKey:] + 172
11  CoreFoundation                      0x00000001843e5924 -[CFPrefsSource copyValueForKey:] + 52
12  CoreFoundation                      0x00000001843e58d8 __76-[_CFXPreferences copyAppValueForKey:identifier:container:configurationURL:]_block_invoke + 32
13  CoreFoundation                      0x00000001843ddf8c __108-[_CFXPreferences(SearchListAdditions) withSearchListForIdentifier:container:cloudConfigurationURL:perform:]_block_invoke + 376
14  CoreFoundation                      0x0000000184558764 -[_CFXPreferences withSearchListForIdentifier:container:cloudConfigurationURL:perform:] + 384
15  CoreFoundation                      0x00000001843dd860 -[_CFXPreferences copyAppValueForKey:identifier:container:configurationURL:] + 168
16  CoreFoundation                      0x00000001843dd77c _CFPreferencesCopyAppValueWithContainerAndConfiguration + 112
17  SystemConfiguration                 0x0000000184fab8ec SCDynamicStoreCopyProxiesWithOptions + 180
18  _scproxy.cpython-310-darwin.so      0x0000000103703aa0 get_proxies + 28
19  Python                              0x000000010301cba8 cfunction_vectorcall_NOARGS + 96
20  Python                              0x00000001030c4cf8 call_function + 128
21  Python                              0x00000001030c2538 _PyEval_EvalFrameDefault + 43144
22  Python                              0x00000001030b6a5c _PyEval_Vector + 376
23  Python                              0x00000001030c4cf8 call_function + 128
24  Python                              0x00000001030c2538 _PyEval_EvalFrameDefault + 43144
25  Python                              0x00000001030b6a5c _PyEval_Vector + 376
26  Python                              0x00000001030c4cf8 call_function + 128
27  Python                              0x00000001030c2538 _PyEval_EvalFrameDefault + 43144
28  Python                              0x00000001030b6a5c _PyEval_Vector + 376
29  Python                              0x0000000102fcaeac _PyObject_FastCallDictTstate + 96
30  Python                              0x0000000103040abc slot_tp_init + 196
31  Python                              0x0000000103038a8c type_call + 288
32  Python                              0x0000000102fcac44 _PyObject_MakeTpCall + 136
33  Python                              0x00000001030c4d88 call_function + 272
34  Python                              0x00000001030c2538 _PyEval_EvalFrameDefault + 43144
35  Python                              0x00000001030b6a5c _PyEval_Vector + 376
36  Python                              0x00000001030c4cf8 call_function + 128
37  Python                              0x00000001030c2538 _PyEval_EvalFrameDefault + 43144
38  Python                              0x00000001030b6a5c _PyEval_Vector + 376
39  Python                              0x00000001030c4cf8 call_function + 128
40  Python                              0x00000001030c25c0 _PyEval_EvalFrameDefault + 43280
41  Python                              0x00000001030b6a5c _PyEval_Vector + 376
42  Python                              0x0000000102fcdeb0 method_vectorcall + 124
43  Python                              0x00000001030c4cf8 call_function + 128
44  Python                              0x00000001030c25c0 _PyEval_EvalFrameDefault + 43280
45  Python                              0x00000001030b6a5c _PyEval_Vector + 376
46  Python                              0x0000000102fcdeb0 method_vectorcall + 124
47  Python                              0x00000001030c4cf8 call_function + 128
48  Python                              0x00000001030c25c0 _PyEval_EvalFrameDefault + 43280
49  Python                              0x00000001030b6a5c _PyEval_Vector + 376
50  Python                              0x0000000102fcdeb0 method_vectorcall + 124
51  Python                              0x00000001030c4cf8 call_function + 128
52  Python                              0x00000001030c25c0 _PyEval_EvalFrameDefault + 43280
53  Python                              0x00000001030b6a5c _PyEval_Vector + 376
54  Python                              0x0000000102fcdeb0 method_vectorcall + 124
55  Python                              0x00000001030c4cf8 call_function + 128
56  Python                              0x00000001030c25c0 _PyEval_EvalFrameDefault + 43280
57  Python                              0x00000001030b6a5c _PyEval_Vector + 376
58  Python                              0x0000000102fcdeb0 method_vectorcall + 124
59  Python                              0x00000001030c4cf8 call_function + 128
60  Python                              0x00000001030c25c0 _PyEval_EvalFrameDefault + 43280
61  Python                              0x00000001030b6a5c _PyEval_Vector + 376
62  Python                              0x00000001030c4cf8 call_function + 128
63  Python                              0x00000001030c2510 _PyEval_EvalFrameDefault + 43104
*** end of backtrace ***

After initially looking, a pattern seemed to emerge where the RSS memory right before each segfault was 22mb. This led me to believe that the process can't grow beyond this limit for some reason? for now, I've added an extra process so there is always at least 3 in the mix and set reload-on-rss to 22 for now to see if this helps. What this means is that processes are getting restarted very frequently which isn't ideal; however, this may be just the workaround we need for now and we can productionize it more down the road. Time will tell if this actually did the trick.

Justintime50 commented 1 year ago

I was able to find the solution to the problem, it appears to be specific to macOS which is where I'm running uwsgi. Per https://bugs.python.org/issue30385 and https://github.com/unbit/uwsgi/issues/1722, it was suggested to add os.environ["no_proxy"] = "*" to the app. This removes the reliance on the macOS CFPREFERENCES which is ultimately what was causing the segfault. This may not be a perfect solution; however, it is working for my use-case. The app has been running for 5 days now uninterrupted when previously it couldn't make it 24 hours without segfaulting which shows promise and I have no need for a proxy.