Appsilon / mbaza

Save 99% of Your Time Classifying Camera-Trap Footage. Completely Free.
https://appsilon.com/data-for-good/mbaza-ai/
GNU Affero General Public License v3.0
30 stars 7 forks source link

Gabon model can't load on Windows #112

Closed marekrogala closed 4 years ago

marekrogala commented 4 years ago

We get the following error on Windows, only for the Gabon model:

Found 628 images.
Loading model: ..\gabon\export.pkl.
Traceback (most recent call last):
  File "main.py", line 63, in <module>
    main()
  File "main.py", line 59, in main
    infer_to_csv(args)
  File "C:\Users\marro\Documents\GitHub\wildlife-explorer\models\runner\functions.py", line 198, in infer_to_csv
    preds, classes = get_predictions(args.model, images, args.pytorch_num_workers, args.batch_size)
  File "C:\Users\marro\Documents\GitHub\wildlife-explorer\models\runner\functions.py", line 67, in get_predictions
    learn = load_model(model, images, pytorch_num_workers, batch_size)
  File "C:\Users\marro\Documents\GitHub\wildlife-explorer\models\runner\functions.py", line 41, in load_model
    learn = load_learner(model_path.parent, model_path.name, test=images, num_workers=pytorch_num_workers)
  File "C:\Users\marro\miniconda3\envs\serengeti2\lib\site-packages\fastai\basic_train.py", line 621, in load_learner
    state = torch.load(source, map_location='cpu') if defaults.device == torch.device('cpu') else torch.load(source)
  File "C:\Users\marro\miniconda3\envs\serengeti2\lib\site-packages\torch\serialization.py", line 593, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\marro\miniconda3\envs\serengeti2\lib\site-packages\torch\serialization.py", line 773, in _legacy_load
    result = unpickler.load()
  File "C:\Users\marro\miniconda3\envs\serengeti2\lib\pathlib.py", line 1004, in __new__
    % (cls.__name__,))
NotImplementedError: cannot instantiate 'PosixPath' on your system

PosixPath and WindowsPath are system-specific: https://docs.python.org/3/library/pathlib.html

this error is caused by PosixPath that got serialized. Fastai recommends using save/load instead (see https://github.com/fastai/fastai/issues/1482). But somehow the Serengeti model works. I looked on windows and linux and the below code gives PosixPath on Linux and WindowsPath on Windows. Why doesn't this work the same for the Gabon model? @swiezew what were the differences? probably different versions of pytorch and fastai used in training? did you do anything specific to make the Serengeti model work on Win?

learn = load_learner('.', 'trained_model.pkl')
print(learn.data.path)

i think we want to have a uniform version across models. Solutions i see:

  1. if we can find a way to get the Gabon model to behave like the Serengeti one, fix it this way
  2. otherwise we need to convert both models into the saved version (only weights) and change the runner script to expect pth instead of pkl - as described in https://github.com/fastai/fastai/issues/1482/#issuecomment-587070809
  3. hack the fastai code as described in https://forums.fast.ai/t/lesson-3-load-data-fails/43726/6
swiezew commented 4 years ago

Added the PR with a naive solution - @kamilzyla: please test if it helps (I cannot easily do it on Windows from the forest I'm in).

Basically I have taken the two models in this repo and loaded each and exported again using: fastai==1.0.61 torch==1.4.0 torchvision==0.5.0 on the GCP's Linux VMs

If this doesn't work we will need to use methods 2 or 3. The problem with 2 is that in order to load a .pth file you need to have a learner already. While you can create an empty databunch (knowing the classes from an extra file), you also need to specify base architecture (ResNet-50 in our case) which is then triggering a download by default. The default won't work offline.

kamilzyla commented 4 years ago

I tested this on Windows and Gabon model still fails with the same error, while Serengeti still works fine.

marekrogala commented 4 years ago

Fixed by overwriting model.model_dir with a string and re-exporting