chaquo / chaquopy

Chaquopy: the Python SDK for Android
https://chaquo.com/chaquopy/
MIT License
753 stars 129 forks source link

Android application crash after a random process time #287

Closed Clement-W closed 4 years ago

Clement-W commented 4 years ago

Hi ! I'm using Chaquopy to execute python script on Android applications. The goal here is to detect objects on saved images and compute the distance between the detected objects and the camera. To do it i'm using YOLO (TFlite will be used in the further work) and an encoder-decoder neural network that predict depth into images. The process is :

This works really well on a computer with python3 but unfortunately I'm encountering an issue by using this process with Chaquopy on an Android application. The application is randomly crashing, sometimes after 10s or after 1min and sometimes everything goes well without crashing.

Here is what i got in the Logcat of AndroidStudio:

05-18 11:30:18.627 25622-26057/com.example.chaquo I/python.stdout: -> Predicting on 21 test images05-18 11:30:18.627 25622-26057/com.example.chaquo I/python.stdout: 
05-18 11:30:28.132 799-921/? W/InputDispatcher: channel '35e8205a com.example.chaquo/com.example.chaquo.MainActivity (server)' ~ Consumer closed input channel or an error occurred.  events=0x9
05-18 11:30:28.132 799-921/? E/InputDispatcher: channel '35e8205a com.example.chaquo/com.example.chaquo.MainActivity (server)' ~ Channel is unrecoverably broken and will be disposed!
05-18 11:30:28.148 799-1512/? D/ActivityManager: Kill com.example.chaquo quietly

Thanks for you help.

mhsmith commented 4 years ago

What devices does the problem happen on, and which versions of Android? Are there any differences between devices, or any devices where the problem never happens?

Are there any other nearby messages in the Logcat? Make sure the box on the right is set to "No Filters", there's no text in the search box, and the level is set to "Debug" or "Verbose".

What framework are you using to run YOLO: Tensorflow, PyTorch or something else?

Clement-W commented 4 years ago

The problem happens on a Huawei P smart 2019 running Android 10.0 and on a Dashcam T98 running Android 5.1. The huawei is much more powerful than the dashcam but the same error occurs. The error occurs faster on the dashcam than on the huawei.

I'm using the pretrained weights of YOLO thanks to OpenCV for the detection, and PyTorch (torch and torchvision) for the encoder-decoder.

Here is other messages given in the Verbose Logcat without any filters :

05-18 14:16:32.921 862-1779/? D/ActivityManager: SVC-mBroadcastQueues: com.android.server.am.BroadcastQueue@2bb062f1
05-18 14:16:32.921 862-1779/? D/ActivityManager: SVC-mBroadcastQueues: com.android.server.am.BroadcastQueue@fa5bed6
05-18 14:16:32.993 862-1224/? E/AudioService: Media server died.
05-18 14:16:33.138 1318-1318/? I/ServiceManager: Waiting for service media.audio_policy...
05-18 14:16:33.525 862-1224/? E/AudioService: Media server died.
05-18 14:16:33.734 7423-7423/? D/BufferBean: bufferSize:777600
05-18 14:16:33.903 7423-7463/? W/IMediaDeathNotifier: Media player service not published, waiting...
05-18 14:16:34.050 862-1224/? E/AudioService: Media server died.
05-18 14:16:34.164 1318-1318/? I/ServiceManager: Waiting for service media.audio_policy...
05-18 14:16:34.516 7423-7463/? I/ServiceManager: Waiting for service media.player...
05-18 14:16:34.662 862-1224/? E/AudioService: Media server died.
05-18 14:16:34.809 862-877/? D/ActivityManager: Death received in com.android.server.am.ActivityManagerService$AppDeathRecipient@20131eea for thread android.os.BinderProxy@e8ba5db
05-18 14:16:34.845 862-1171/? W/InputDispatcher: channel '30176b54 com.example.chaquo/com.example.chaquo.MainActivity (server)' ~ Consumer closed input channel or an error occurred.  events=0x9
05-18 14:16:34.845 862-1171/? E/InputDispatcher: channel '30176b54 com.example.chaquo/com.example.chaquo.MainActivity (server)' ~ Channel is unrecoverably broken and will be disposed!
05-18 14:16:34.877 862-1706/? D/DisplayManagerService: Display listener for pid 7172 died.
05-18 14:16:34.888 307-361/? D/GuiExt: [GuiExtS] binder of dump tunnel(RB-0x7f6657b4e0) died
05-18 14:16:34.908 862-934/? D/BluetoothManagerService: Message: 20
05-18 14:16:34.930 862-934/? D/BluetoothManagerService: Added callback: android.bluetooth.IBluetoothManagerCallback$Stub$Proxy@377edb51:true
05-18 14:16:35.003 862-1613/? I/WindowState: WIN DEATH: Window{30176b54 u0 com.example.chaquo/com.example.chaquo.MainActivity}
05-18 14:16:35.015 862-1613/? W/InputDispatcher: Attempted to unregister already unregistered input channel '30176b54 com.example.chaquo/com.example.chaquo.MainActivity (server)'
05-18 14:16:35.123 862-877/? D/ActivityManager: Kill com.example.chaquo quietly
05-18 14:16:35.129 862-877/? I/ActivityManager: Process com.example.chaquo (pid 7172) has died
mhsmith commented 4 years ago

Are you sure you're not just reaching the 5-minute time limit of the unlicensed copy of Chaquopy? When that happens, there should be a log message and an on-screen notification 10 seconds before the process dies, saying "Unlicensed copy of Chaquopy: app will shut down in 10 seconds."

If you're sure that's not it, would you be able to email me a copy of your source code, or at least a copy of your APK so I can try it for myself? Of course, I'll treat anything you send me as confidential.

Clement-W commented 4 years ago

Thank you very much for the help, but finally I've deducted that the dashcam were not powerful enough to run my program (only 1GB of RAM). I'll try another way to run it. I'm disappointed because chaquopy were exactly what I needed for my project. I may come back to it in the future.

mhsmith commented 4 years ago

Thanks for letting me know. How did you deduce that, and did you work out why it was also crashing on the more powerful device?

Clement-W commented 4 years ago

By eliminating the other possibilities, it's not an ANR and there is no error about the memory. So I'm not 100% sure about it but I can't find out another possible issue. Concerning the more powerful device, since I've optimized the code (Now a Service launch an AsyncTask that launch my python script), I don't have any problem. But I need to run the code on the dashcam so my issue is not really resolved.

mhsmith commented 4 years ago

If you think it might be a shortage of memory, then some ideas are:

Clement-W commented 4 years ago

Thanks for your help, I'll try to take a look at those ideas.

Clement-W commented 4 years ago

I'm finally still trying to know the source of my issue, but I can't find out from where it comes. The crash occurs during the processing of the first image and do not finish it. So it's not about keeping too much references of images. I'm using images with a 1280x320 resolution. Would you accept to take a look at my code ? I can send you the Android project or the apk as you want. Thank you so much for your help.

mhsmith commented 4 years ago

OK, the easiest way would be to upload the source code to GitHub: you can make it a private repository and share it with me if you want. Or you can email me a ZIP if you prefer.

Is it now only crashing on the dashcam? If so, it would be useful if you could also include the version which crashed on both devices, perhaps as a separate Git commit. That way I'll be more likely to reproduce it.

Clement-W commented 4 years ago

I'll make a private repository on GitHub and share it with you soon. It will contains the two versions. Thanks for your help.

Clement-W commented 4 years ago

I've invited you to the repository. You'll find some useful information in the README.

mhsmith commented 4 years ago

Thanks, I'll take a look at it.

mhsmith commented 4 years ago

It does look like memory usage could be the problem. When I tested androidV1 on an x86 emulator with API level 29, it failed with this log:

2020-05-24 22:25:36.468 7013-7074/com.example.chaquo I/python.stdout: [INFO] loading YOLO from disk...
2020-05-24 22:25:36.468 7013-7074/com.example.chaquo I/python.stdout:  
2020-05-24 22:25:37.993 7013-7074/com.example.chaquo I/python.stdout: -> Loading model from 
2020-05-24 22:25:37.993 7013-7074/com.example.chaquo I/python.stdout:  
2020-05-24 22:25:37.993 7013-7074/com.example.chaquo I/python.stdout: /data/user/0/com.example.chaquo/files/chaquopy/AssetFinder/app/models/mono_1024x320
2020-05-24 22:25:37.993 7013-7074/com.example.chaquo I/python.stdout:  
2020-05-24 22:25:37.993 7013-7074/com.example.chaquo I/python.stdout:    Loading pretrained encoder
2020-05-24 22:25:37.994 7013-7074/com.example.chaquo I/python.stdout:  
2020-05-24 22:25:41.164 7013-7074/com.example.chaquo I/python.stdout:    Loading pretrained decoder
2020-05-24 22:25:41.164 7013-7074/com.example.chaquo I/python.stdout:  
2020-05-24 22:25:41.447 7013-7074/com.example.chaquo I/python.stdout: -> Predicting on 21 test images
2020-05-24 22:25:41.447 7013-7074/com.example.chaquo I/python.stdout:  
2020-05-24 22:25:49.771 7013-7074/com.example.chaquo I/python.stdout: [+] exec time for YOLO : 
2020-05-24 22:25:49.771 7013-7074/com.example.chaquo I/python.stdout:  
2020-05-24 22:25:49.772 7013-7074/com.example.chaquo I/python.stdout: 8.324004888534546
2020-05-24 22:25:49.772 7013-7074/com.example.chaquo I/python.stdout:  
2020-05-24 22:25:49.772 7013-7074/com.example.chaquo I/python.stdout: s
2020-05-24 22:25:49.772 7013-7074/com.example.chaquo I/python.stdout:  
2020-05-24 22:25:55.245 7013-7074/com.example.chaquo W/libc: memalign(64, 283115520) failed: returning null pointer
2020-05-24 22:25:55.818 7013-7074/com.example.chaquo E/AndroidRuntime: FATAL EXCEPTION: IntentService[ServiceTest]
    Process: com.example.chaquo, PID: 7013
    com.chaquo.python.PyException: RuntimeError: [enforce fail at CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 283115520 bytes. Buy new RAM!
    (no backtrace available)
        at <python>.torch.nn.modules.conv.conv2d_forward(conv.py:341)
        at <python>.torch.nn.modules.conv.forward(conv.py:345)
        at <python>.torch.nn.modules.module.__call__(module.py:532)
        at <python>.layers.forward(layers.py:135)
        at <python>.torch.nn.modules.module.__call__(module.py:532)
        at <python>.layers.forward(layers.py:116)
        at <python>.torch.nn.modules.module.__call__(module.py:532)
        at <python>.networks.depth_decoder.forward(depth_decoder.py:61)
        at <python>.torch.nn.modules.module.__call__(module.py:532)
        at <python>.test.test_simple(test.py:186)
        at <python>.test.start(test.py:399)
        at <python>.chaquopy_java.call(chaquopy_java.pyx:281)
        at <python>.chaquopy_java.Java_com_chaquo_python_PyObject_callAttrThrows(chaquopy_java.pyx:253)
        at com.chaquo.python.PyObject.callAttrThrows(Native Method)
        at com.chaquo.python.PyObject.callAttr(PyObject.java:209)
        at com.example.chaquo.ServiceTest.onHandleIntent(ServiceTest.java:23)
        at android.app.IntentService$ServiceHandler.handleMessage(IntentService.java:78)
        at android.os.Handler.dispatchMessage(Handler.java:107)
        at android.os.Looper.loop(Looper.java:214)
        at android.os.HandlerThread.run(HandlerThread.java:67)

Increasing the emulator memory to 4 GB didn't make any difference. However, switching to an x86_64 emulator fixed the problem, so that suggests that it did have enough memory, it just couldn't find 283 MB of contiguous address space.

It eventually stopped after 17 images because it reached the 5-minute unlicensed time limit. I suspect that's what happened in your sigsegvError.txt file, though the file doesn't quite go back far enough. Is there an "Unlicensed copy of Chaquopy" warning 10 seconds before the app died?

mhsmith commented 4 years ago

On a Nexus 5X with 2 GB of RAM, it doesn't log any stack trace, it just dies as you describe above:

2020-05-24 22:19:29.456 5105-5175/com.example.chaquo I/python.stdout: [INFO] loading YOLO from disk...
2020-05-24 22:19:29.457 5105-5175/com.example.chaquo I/python.stdout:  
2020-05-24 22:19:31.436 5105-5175/com.example.chaquo I/python.stdout: -> Loading model from 
2020-05-24 22:19:31.437 5105-5175/com.example.chaquo I/python.stdout:  
2020-05-24 22:19:31.437 5105-5175/com.example.chaquo I/python.stdout: /data/user/0/com.example.chaquo/files/chaquopy/AssetFinder/app/models/mono_1024x320
2020-05-24 22:19:31.437 5105-5175/com.example.chaquo I/python.stdout:  
2020-05-24 22:19:31.437 5105-5175/com.example.chaquo I/python.stdout:    Loading pretrained encoder
2020-05-24 22:19:31.437 5105-5175/com.example.chaquo I/python.stdout:  
2020-05-24 22:19:33.187 5105-5175/com.example.chaquo I/python.stdout:    Loading pretrained decoder
2020-05-24 22:19:33.187 5105-5175/com.example.chaquo I/python.stdout:  
2020-05-24 22:19:33.489 5105-5175/com.example.chaquo I/python.stdout: -> Predicting on 21 test images
2020-05-24 22:19:33.489 5105-5175/com.example.chaquo I/python.stdout:  
2020-05-24 22:19:45.170 5105-5175/com.example.chaquo I/python.stdout: [+] exec time for YOLO : 
2020-05-24 22:19:45.171 5105-5175/com.example.chaquo I/python.stdout:  
2020-05-24 22:19:45.174 5105-5175/com.example.chaquo I/python.stdout: 11.634570360183716
2020-05-24 22:19:45.174 5105-5175/com.example.chaquo I/python.stdout:  
2020-05-24 22:19:45.174 5105-5175/com.example.chaquo I/python.stdout: s
2020-05-24 22:19:45.175 5105-5175/com.example.chaquo I/python.stdout:  
2020-05-24 22:19:55.589 880-4010/? I/WindowManager: WIN DEATH: Window{6c1da15 u0 com.example.chaquo/com.example.chaquo.MainActivity}
2020-05-24 22:19:55.589 880-17372/? I/ActivityManager: Process com.example.chaquo (pid 5105) has died

In the 10 seconds between 22:19:45 and 22:19:55, there are many other "has died" log messages suggesting that the system is closing background apps to free memory.

Perhaps a smaller model will be necessary in order to run on a low-end device.

mhsmith commented 4 years ago

You'll currently get torchvision version 0.2.2 from about a year ago, which was the last version without native components. I'm planning to add a newer version soon: please subscribe to #233 for updates on this.

However, it looks like the only use of torchvision in your script is transforms.ToTensor, and when I added some logging around that line I found that it completed successfully on the Nexus 5X. So I don't think torchvision is the cause of the problem.

It might still be useful for you to add additional logging to identify exactly which line of the script is executing at the moment of the crash.

One final note: everything that's in opencv-python is also in opencv-contrib-python, so you shouldn't install them both.

Clement-W commented 4 years ago

About the sigsegv error, I forgot the possibility of "Unlicensed copy of Chaquopy" so that was maybe the source of this error.

According to your tests, it seems that my issue is about memory, and as you said the model is maybe too large to run on a low-end device like the dascham. Most of the time, the crash occurs when a model is running.

For the torchvision version, I'm already subscribed to #233.

Thank you so much for you helpful advices, I will try to optimize my models, and if the error persists, I'll think about a server-side integration.

mhsmith commented 4 years ago

Did you manage to find a solution?

Clement-W commented 4 years ago

I'm finally working on a server-side integration of my python script. Changing the architecture of my models would be a huge work, it's impossible for me to start this now. Anyway, according to your tests, I don't think that the dashcam would be powerful enough to run my script. A solution would be changing my device by a better one but it's not feasible now.

mhsmith commented 4 years ago

That makes sense: good luck.

Clement-W commented 4 years ago

Thanks!