bernard24 / RIS

Implementation of the approach described in the paper "Recurrent Instance Segmentation" https://arxiv.org/abs/1511.08250.
MIT License
27 stars 17 forks source link

Plants Inference - Is infer_example.lua complete? #15

Closed isn4 closed 7 years ago

isn4 commented 7 years ago

Hi @bernard24

As mentioned in issue #5 , in order to run infer_example.lua I have to find a trained model to do the inference with. This brings up several confusions:

  1. Which model do I use? 'plants_convlstm.model' or 'plants_pre_lstm.model'
  2. When using torch.load() to load either of the above models, this error appears: unknown Torch class Adding require 'cunn' and 'cudnn' moves me past this error.
  3. Although the model is loaded and can be printed (ie. the model is not nil), I cannot call model:forward(input) on the model. This error appears: attempt to call method 'forward' (a nil value)

I'm not too sure how this script should be altered or how it should be reading in models or even which models.

Please let me know if you'd like more detailed error stack traces. Also, excuse me - I am very new to Torch & Lua. Thank you!

isn4 commented 7 years ago

Hi @bernard24 , I have worked through most of the code, making adjustments where I found it was necessary. On line 34 (x = model:forward(input)), I chose 'plants_pre_lstm.model' as the model to do inference on and 'plants_convlstm.model' as the model prototyles, protos, on line 51 (local lst = protos.rnn:forward{x, unpack(current_state)}). This fixed the "attempt to call method 'forward' (a nil value)" error on the model but now I get the error "cannot convert 'struct THCudaTensor ' to 'struct THFloatTensor '" on both the loaded model and the loaded model prototypes, protos. I fixed this by adding :float() to the end of my loaded model but the same :float() returns a "attempt to call method 'float' (a nil value)" error on the protos object. Is there another way you mean for users to fit the model's prototypes to the script?

bernard24 commented 7 years ago

Hi @isn4 As you mentioned in the last message both models are needed: the image is the input argument to plants_pre_lstm.model and the output of that is the input to plants_convlstm.model. I have not been able to reproduce your error messages, but I will try in a different computer on monday. In any case, for debugging purposes, it could be useful if you try to learn a model from scratch, without preloading the models, then let it run for a little while and stop it, and finally train again preloading the model you just trained. In that case, does it give you the same error messages?

isn4 commented 7 years ago

Hi @bernard24 , thank you for getting back to me! I'm a little confused about your debugging suggestion though. For clarification, are you suggesting I retrain the network (using experiment.lua) to generate new a new model & new protos file but not to convergence (completion) then stop the training and re-start the training using the loaded model & protos files that were not trained to convergence? If so, would I then later load the model & protos file into infer_example.lua to re-try the inference process?

Additionally, is there any way for me to see what the protos file should look like? I have printed my loaded protos object to a .out file (which means it's not a null/nil object) but I'm not sure if it looks the way it should.

Lastly, here is the nil value error I've been getting: /opt/packages/Torch/torch_5633c24e/bin/luajit: ...4s8lp/isn4/project/RIS/plants_learning/infer_example.lua:50: attempt to call method 'float' (a nil value) stack traceback: ...4s8lp/isn4/project/RIS/plants_learning/infer_example.lua:50: in main chunk [C]: in function 'dofile' ...rch/torch_5633c24e/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x004064f0

I get this error on any method I call on protos - ie. :forward(), :float(), :cuda(), etc. I assume at this point that the way the model prototypes are being generated/saved is not in the format that allows for methods to be called on it.

bernard24 commented 7 years ago

Hi @isn4 You were right, sorry for the confusion. I have modified infer_example.lua, so that it should work now. Regarding your current code, probably the best solution is to set gpumode = 1, so that everything is performed on gpu. Please let me know if you have any issues.

isn4 commented 7 years ago

Hi @bernard24 , Just took a look at the changes made to the infer_example.lua script. Out of curiosity was the major change requiring 'IoU4Criterion' and setting gpumode to 1? I've just checked the changes you've made with the version of your script that I have. The difference I see is that 'IoU4Criterion' and the set gpumode. I have tried to require 'IoU4Criterion' and 'MatchCriterion' but I'm still running into the same problems. Could this be a version issue with torch? I'm currently waiting to hear from the person who installed torch on the system I'm using, but which "version" (I hear torch doesn't exactly have versions) of torch do you have, ie. when did you install/update it? Thank you!

isn4 commented 7 years ago

UPDATE: I've got the code working now - I had to rework some typesetting but your code changes worked! Thank you!

bernard24 commented 7 years ago

Great!