CRBS / cdeep3m

Please go to https://github.com/CRBS/cdeep3m2 for most recent version
Other
58 stars 10 forks source link

Add validation data to training failed #58

Closed cakuba closed 5 years ago

cakuba commented 5 years ago

Hi,

Sorry again for the problem... But we are currently trying to add validation data to the training process as described here https://github.com/CRBS/cdeep3m/wiki/Add-Validation-to-training. What we have done is simply to move some data from training data directory to validation data directory, and then, use the command as

$> PreprocessValidation.m ./neural/valid/images ./neural/valid/labels ./neural/valid/data

which runs fine with a single validation data file saves as "validation_stack_v1.h5" under the output directory of ./neural/valid/data. Then, we train the model as usual except that a new option was provided as

./runtraining.sh --validation_dir ./neural/valid/data --numiterations 100 ./augmentedtraining/ ./neural_train_out

Unfortunately, the training process failed with log message as below

========================================================= I1205 12:08:37.455427 1958 data_provider.cpp:28] Loading list of HDF5 filenames from: /usr/local/src/CDeep3M/cdeep3m-1.6.2/neural_train_out//valid_file.txt I1205 12:08:37.455468 1958 data_provider.cpp:42] Number of HDF5 files: 1 I1205 12:08:37.455528 1958 sample_selector.cpp:58] read prob from file : /usr/local/src/CDeep3M/cdeep3m-1.6.2/neural_train_out//1fm/label_class_selection.prototxt I1205 12:08:37.455615 1958 sample_selector.cpp:78] rest_of_labelmapping = 0 1 I1205 12:08:37.455641 1958 sample_selector.cpp:93] label map :0--->0 I1205 12:08:37.455648 1958 sample_selector.cpp:93] label map :1--->1 I1205 12:08:37.456550 1958 sample_selector.cpp:95] label_probmap size =2 I1205 12:08:37.456562 1958 sample_selector.cpp:116] scale_factor = 3.33333 I1205 12:08:37.456571 1958 sample_selector.cpp:117] bottom_prob = 0.3 I1205 12:08:37.456578 1958 sample_selector.cpp:118] label_prob_vec.size = 2 I1205 12:08:37.456583 1958 sample_selector.cpp:164] size of prob = 1 I1205 12:08:37.456589 1958 sample_selector.cpp:19] lable class [0] weight =0.25 I1205 12:08:37.456594 1958 sample_selector.cpp:19] lable class [1] weight =1 I1205 12:08:37.456658 1958 patch_sampler.cpp:57] runner setup done ... count =0 I1205 12:08:37.456672 1958 net.cpp:106] Creating Layer data I1205 12:08:37.456689 1958 net.cpp:411] data -> data I1205 12:08:37.456702 1958 net.cpp:411] data -> label I1205 12:08:37.456740 1958 patch_sampler.cpp:121] loading batch patch_count = 0 I1205 12:08:37.467742 1965 patch_sampler.cpp:319] solver_count = 1 size of queue pairs = 1 I1205 12:08:37.467789 1965 patch_sampler.cpp:323] size of queue is now = 0 I1205 12:08:37.514232 1958 data_provider.cpp:108] d_size =3 I1205 12:08:37.514272 1958 data_provider.cpp:111] loaded data shape : 301 I1205 12:08:37.514279 1958 data_provider.cpp:111] loaded data shape : 301 I1205 12:08:37.514284 1958 data_provider.cpp:111] loaded data shape : 100 I1205 12:08:37.514289 1958 data_provider.cpp:113]
I1205 12:08:37.514295 1958 data_provider.cpp:115] loaded label shape : 301 I1205 12:08:37.514302 1958 data_provider.cpp:115] loaded label shape : 301 I1205 12:08:37.514308 1958 data_provider.cpp:115] loaded label shape : 100 I1205 12:08:37.514550 1958 data_provider.cpp:144] d_size =5 I1205 12:08:37.514565 1958 data_provider.cpp:147] data shape after prependig : 1 I1205 12:08:37.514572 1958 data_provider.cpp:147] data shape after prependig : 1 I1205 12:08:37.514578 1958 data_provider.cpp:147] data shape after prependig : 301 I1205 12:08:37.514585 1958 data_provider.cpp:147] data shape after prependig : 301 I1205 12:08:37.514590 1958 data_provider.cpp:147] data shape after prependig : 100 I1205 12:08:37.514596 1958 data_provider.cpp:149]
I1205 12:08:37.514602 1958 data_provider.cpp:151] label shape after prependig : 1 I1205 12:08:37.514609 1958 data_provider.cpp:151] label shape after prependig : 1 I1205 12:08:37.514616 1958 data_provider.cpp:151] label shape after prependig : 301 I1205 12:08:37.514622 1958 data_provider.cpp:151] label shape after prependig : 301 I1205 12:08:37.514634 1958 data_provider.cpp:151] label shape after prependig : 100 I1205 12:08:37.514650 1958 data_provider.cpp:175] loaded hdf5 file /usr/local/src/CDeep3M/cdeep3m-1.6.2/neural/valid/data//validation_stack_v1.h5 I1205 12:08:37.514675 1958 patch_sampler.cpp:398] label_shape0=320 inputshape[i] =301 F1205 12:08:37.514708 1958 patch_sampler.cpp:399] Check failed: diff > 0 (-19 vs. 0) Check failure stack trace: @ 0x7fa903df05cd google::LogMessage::Fail() @ 0x7fa903df2433 google::LogMessage::SendToLog() @ 0x7fa903df015b google::LogMessage::Flush() @ 0x7fa903df2e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fa90453ae0e caffe::PatchCoordFinder<>::GetRandomPatchCenterCoord() @ 0x7fa90453b26f caffe::PatchSampler<>::ReadOnePatch() @ 0x7fa90453bee2 caffe::PatchSampler<>::patch_data_shape() @ 0x7fa9044bc351 caffe::PatchDataLayer<>::DataLayerSetUp() @ 0x7fa904533bc3 caffe::BasePrefetchingDataLayer<>::LayerSetUp() @ 0x7fa904597b75 caffe::Net<>::Init() @ 0x7fa904599581 caffe::Net<>::Net() @ 0x7fa9045e7a82 caffe::Solver<>::InitTestNets() @ 0x7fa9045e8415 caffe::Solver<>::Init() @ 0x7fa9045e8729 caffe::Solver<>::Solver() @ 0x7fa904548d53 caffe::Creator_SGDSolver<>() @ 0x40a348 train() @ 0x407258 main @ 0x7fa90307e830 __libc_start_main @ 0x407919 _start @ (nil) (unknown) Aborted (core dumped)

==================================================

I have to say that simply without the validation option "--validation_dir ./neural/valid/data", the training process was fine and provided some nice results. So, could you please provide some hints to me? Thanks for your help.

Brett

haberlmatt commented 5 years ago

Hi Brett,

looks like your training/validation data is 301 pixel in x/y, which is smaller than the receptive field size. We have built in a correction for the training data when this happens, but looks like we forgot to include this step for the validation data. Thanks for pointing this out we will include this in the next release. In the meantime you could surround your image and label data with zeros to get a image size of at least >320 pixels in x and y. E.g. if you open the images in ImageJ/FIJI you can Image->Adjust->Canvas Size -> 330 pixel, just remember you have to do it for both, the images as well as the labels

Hope this helps, and we'll integrate the bugfix soon, Matt

haberlmatt commented 5 years ago

I fixed the respective file on the master branch, so alternatively you could already replace the following file on your aws instance or local install: https://github.com/CRBS/cdeep3m/blob/master/scripts/functions/imageimporter.m this is probably faster than manually padding your data and will tell us if this fixed the problem

cakuba commented 5 years ago

Hi, thanks for the response. Yes, the image resolution in our current dataset is 301x301. I followed your suggestion to update the file imageimporter.m, and did notice that the validation images were padded to 320x320. But for some reason, it still failed as

... I1206 10:36:33.898596 3682 data_provider.cpp:111] loaded data shape : 320 I1206 10:36:33.898602 3682 data_provider.cpp:111] loaded data shape : 320 I1206 10:36:33.898604 3682 data_provider.cpp:111] loaded data shape : 100 I1206 10:36:33.898608 3682 data_provider.cpp:113] I1206 10:36:33.898612 3682 data_provider.cpp:115] loaded label shape : 320 I1206 10:36:33.898615 3682 data_provider.cpp:115] loaded label shape : 320 I1206 10:36:33.898618 3682 data_provider.cpp:115] loaded label shape : 100 I1206 10:36:33.898912 3682 data_provider.cpp:144] d_size =5 I1206 10:36:33.898921 3682 data_provider.cpp:147] data shape after prependig : 1 I1206 10:36:33.898926 3682 data_provider.cpp:147] data shape after prependig : 1 I1206 10:36:33.898928 3682 data_provider.cpp:147] data shape after prependig : 320 I1206 10:36:33.898931 3682 data_provider.cpp:147] data shape after prependig : 320 I1206 10:36:33.898936 3682 data_provider.cpp:147] data shape after prependig : 100 I1206 10:36:33.898938 3682 data_provider.cpp:149] I1206 10:36:33.898941 3682 data_provider.cpp:151] label shape after prependig : 1 I1206 10:36:33.898946 3682 data_provider.cpp:151] label shape after prependig : 1 I1206 10:36:33.898948 3682 data_provider.cpp:151] label shape after prependig : 320 I1206 10:36:33.898952 3682 data_provider.cpp:151] label shape after prependig : 320 I1206 10:36:33.898955 3682 data_provider.cpp:151] label shape after prependig : 100 I1206 10:36:33.898958 3682 data_provider.cpp:175] loaded hdf5 file /usr/local/src/CDeep3M/cdeep3m-1.6.2/hust_neural/valid/data//validation_stack_v1.h5 20181206-103633.3682!F1206 10:36:33.898988 3682 patch_sampler.cpp:399] Check failed: diff > 0 (0 vs. 0) Check failure stack trace: @ 0x7fda67f555cd google::LogMessage::Fail() @ 0x7fda67f57433 google::LogMessage::SendToLog() @ 0x7fda67f5515b google::LogMessage::Flush() @ 0x7fda67f57e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fda6869fe0e caffe::PatchCoordFinder<>::GetRandomPatchCenterCoord() @ 0x7fda686a026f caffe::PatchSampler<>::ReadOnePatch() @ 0x7fda686a0ee2 caffe::PatchSampler<>::patch_data_shape() @ 0x7fda68621351 caffe::PatchDataLayer<>::DataLayerSetUp() @ 0x7fda68698bc3 caffe::BasePrefetchingDataLayer<>::LayerSetUp() ...

So, I further checked the log and found that the training images were processed as 325x325 (!)

... I1206 10:36:29.089555 3682 data_provider.cpp:111] loaded data shape : 325 I1206 10:36:29.089558 3682 data_provider.cpp:111] loaded data shape : 500 I1206 10:36:29.089561 3682 data_provider.cpp:113] I1206 10:36:29.089565 3682 data_provider.cpp:115] loaded label shape : 325 I1206 10:36:29.089568 3682 data_provider.cpp:115] loaded label shape : 325 I1206 10:36:29.089571 3682 data_provider.cpp:115] loaded label shape : 500 I1206 10:36:29.089674 3682 data_provider.cpp:144] d_size =5 I1206 10:36:29.089682 3682 data_provider.cpp:147] data shape after prependig : 1 I1206 10:36:29.089685 3682 data_provider.cpp:147] data shape after prependig : 1 I1206 10:36:29.089689 3682 data_provider.cpp:147] data shape after prependig : 325 I1206 10:36:29.089691 3682 data_provider.cpp:147] data shape after prependig : 325 I1206 10:36:29.089695 3682 data_provider.cpp:147] data shape after prependig : 500 ...

Well, it turns out that if I updated the number 320 to 325 in imageimporter.m, everything works fine now! Not sure why, but I guess that it shouldn't be the solution to more general cases, would it? Thanks again for your help.

Brett