apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.2k stars 1.14k forks source link

Google Colab : One Shot Object Detection Runtime Crash #2917

Closed kidboy-man closed 4 years ago

kidboy-man commented 4 years ago

Hi All! Recently I tried to implement turicreate one shot object detection(OSOD) to recognize a brand logo. However, since my MacBook doesn't have a GPU, then I tried to run it on google colab notebook. I managed to have it working until the step of image augmentation, but when I was running the model = tc.one_shot_object_detector.create(starter_images, 'label', backgrounds=my_backgrounds) method, the colab's runtime crashed. What causes this and how to fix this? Has anyone managed to run OSOD in google colab?

TobyRoseman commented 4 years ago

@kidboy-man - What error message do you get? Can you post the entire output of the call to tc.one_shot_object_detector.create?

kidboy-man commented 4 years ago

@TobyRoseman - There is no error message from kernel, but the colab notebook says that the runtime is crashed. image

You can check my notebook here: https://colab.research.google.com/drive/1hRACSEgMB_ykS7IBxCYh7BI9uq0Q5KfR#scrollTo=ImyHz0wgcfZN

Thank you

navanchauhan commented 4 years ago

Hi, the link you have posted is dead now

If your issue has been resolved then please close the issue

Cheers!

kidboy-man commented 4 years ago

Hi @navanchauhan I accidentally deleted the notebook. I already tried to re-run the OSOD but it still crashes the colab runtime. Here, I tried to run the example as written in the documentation. It always crashed at the tc.one_shot_object_detector.create part https://colab.research.google.com/drive/1iSGetSXK6o24vk64-KrkLwNnna4ojfMv?authuser=1

navanchauhan commented 4 years ago

Yup, even I am facing the same issue in Google Colab with your notebook, checking the runtime logs it gives a warning about traitlets being moved to the top level

Importing that also does not fix the issue and the Kernel keeps restarting

TobyRoseman commented 4 years ago

@navanchauhan - Could you post the entire runtime logs here please?

I suspect what's happening here is that TuriCreate is running out of memory. I don't know how to check that with Google Colab. . . . . Does anyone know how much memory Google Colab gives you?

kidboy-man commented 4 years ago

In my runtime, it gives 12GB of RAM and 16GB GPU Memory. image

kidboy-man commented 4 years ago

Hi @TobyRoseman and @navanchauhan , any update on this?

navanchauhan commented 4 years ago

Nah, I am currently in the middle of my final exams, so I don't have that much time

@TobyRoseman


Timestamp | Level | Message
-- | -- | --
Feb 19, 2020, 6:28:09 PM | WARNING | WARNING:root:kernel 2a721722-82c4-41f5-9f8c-d887b4fddf0f restarted
Feb 19, 2020, 6:28:09 PM | INFO | KernelRestarter: restarting kernel (1/5), keep random ports
Feb 19, 2020, 6:24:06 PM | WARNING | warn("IPython.utils.traitlets has moved to a top-level traitlets package.")
Feb 19, 2020, 6:24:06 PM | WARNING | /usr/local/lib/python3.6/dist-packages/IPython/utils/traitlets.py:5: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.
Feb 19, 2020, 6:22:23 PM | INFO | Adapting to protocol v5.1 for kernel 2a721722-82c4-41f5-9f8c-d887b4fddf0f
Feb 19, 2020, 6:22:21 PM | INFO | Kernel started: 2a721722-82c4-41f5-9f8c-d887b4fddf0f
Feb 19, 2020, 6:11:30 PM | INFO | Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Feb 19, 2020, 6:11:30 PM | INFO | http://172.28.0.2:9000/
Feb 19, 2020, 6:11:30 PM | INFO | The Jupyter Notebook is running at:
Feb 19, 2020, 6:11:30 PM | INFO | 0 active kernels
shantanuchhabra commented 4 years ago

Hi there,

As a first step, let's try to figure out where the problem is coming from. Let's separate out the data augmentation and the model training and see which line crashes before we debug further. Could you try out the following snippet:

import turicreate as tc # Line 0
synthetic_data = tc.one_shot_object_detector.util.preview_synthetic_training_data(training_images, 'label') # Line 1
model = tc.object_detector.create(synthetic_data, batch_size=24, max_iterations=200) # Line 2

There are two possible scenarios after running this snippet:

navanchauhan commented 4 years ago

@shantanuchhabra It crashes at line 1

Is there any chance TC is trying to create a new window displaying the images?

I have a hypothesis that what is happening is that turicreate is trying to open a new window, but it is not supported on Colab so it crashes

Output before it crashes on Colab

Downloading https://docs-assets.developer.apple.com/turicreate/data/one_shot_backgrounds.sarray.tar
Download completed: /var/tmp/data_cache/one_shot_backgrounds.sarray.tar
Augmenting input images using 951 background images.
+------------------+--------------+------------------+
| Images Augmented | Elapsed Time | Percent Complete |
+------------------+--------------+------------------+
| 100              | 1m 15s       | 10.5%            |
| 200              | 1m 21s       | 21%              |
| 300              | 1m 32s       | 31.5%            |
| 400              | 1m 42s       | 42%              |
| 500              | 1m 50s       | 52.5%            |
| 600              | 1m 57s       | 63%              |
| 700              | 2m 5s        | 73.5%            |
| 800              | 2m 13s       | 84%              |
| 900              | 2m 22s       | 94.5%            |
+------------------+--------------+------------------+
tesths commented 4 years ago

I think I meet the same issue on turicreate style transfer.

When I use colab for turicreate 6.1 version and train for style transfer. Colab will crash when I run model = tc.style_transfer.create(style, content) WX20200224-100654 And I also try for turicreate 5.8 and mxnet, it's run ok.

My colab notebook url is here.

TobyRoseman commented 4 years ago

@tesths - I do not think this issue is the same problem you are experiencing. I suggest you create a new issue.

@navanchauhan - I really don't think any part of that code should be trying to create a new window.

navanchauhan commented 4 years ago

Could someone try running OSOD using Jupyter-Labs on their systems? My Jupyter installation has gone bonkers and I am not able to test it out

ebolotin6 commented 4 years ago

Facing the same issue. Kernel keeps crashing as training commences. Log: warn("IPython.utils.traitlets has moved to a top-level traitlets package.")

ebolotin6 commented 4 years ago

Facing the same issue. Kernel keeps crashing as training commences. Log: warn("IPython.utils.traitlets has moved to a top-level traitlets package.")

I discovered the reason for this. Certain (pytorch) tensors in my code were not sent to CUDA (on accident). The problem is that Colab didn't return the appropriate Pytorch traceback message for this error but instead crashed.

To discover what the problem actually was, I ran my code on jupyterlab on Floydhub. During testing jupyter returned the appropriate traceback message - which I then fixed and was able to run the code successfully (both on Floydhub and Colab).

navanchauhan commented 4 years ago

@kidboy-man can you try from a fresh notebook? https://colab.research.google.com/drive/14bFRa4ygtd1PDcBn4Zb8BH23MUEWMg2L?usp=sharing OSOD is no longer crashing

TobyRoseman commented 4 years ago

Is this still an issue with the latest version of TuriCreate?

TobyRoseman commented 4 years ago

Closing due to inactivity. Please reopen if this is still and issue in the latest version of TuriCreate.