CRBS / cdeep3m

Please go to https://github.com/CRBS/cdeep3m2 for most recent version
Other
58 stars 10 forks source link

prediction failed (octave error) #78

Open jawonGim opened 3 years ago

jawonGim commented 3 years ago

First of all, thanks for the code sharing.

I've been try to test cdeep3m with docker. training(+retraining with pre-trained model) with my own dataset was just fine. I'm faced octave error when predict boundary map with trained model. this is what the program said. [error msg] $ docker-compose up Creating network "cdeep3m-docker_default" with the default driver Creating cdeep3m-docker_cdeep3m_1 ... done Attaching to cdeep3m-docker_cdeep3m_1 cdeep3m_1 | octave: X11 DISPLAY environment variable not set cdeep3m_1 | octave: disabling GUI features cdeep3m_1 | Starting Image Augmentation cdeep3m_1 | Check image size of: cdeep3m_1 | /data/images/roi9 cdeep3m_1 | Reading file: /data/images/roi9/roi09_0001.png cdeep3m_1 | z_blocks = cdeep3m_1 | cdeep3m_1 | 1 64 cdeep3m_1 | cdeep3m_1 | panic: panic: attempted clean up apparently failed -- aborting... cdeep3m_1 | panic: attempted clean up apparently failed -- aborting... cdeep3m_1 | panic: attempted clean up apparently failed -- aborting... cdeep3m_1 | panic: attempted clean up apparently failed -- aborting... cdeep3m_1 | panic: attempted clean up apparently failed -- aborting... cdeep3m_1 | panic: attempted clean up apparently failed -- aborting... cdeep3m_1 | Segmentation fault -- stopping myself... cdeep3m_1 | attempting to save variables to 'octave-workspace'... cdeep3m_1 | /home/cdeep3m/runprediction.sh: line 124: 13 Aborted (core dumped) DefDataPackages.m "$images" "$augimages" cdeep3m_1 | ERROR, a non-zero exit code (134) was received from: DefDataPackages.m "/data/images/roi9" "/data/predictout/my_25k/roi9/augimages" cdeep3m-docker_cdeep3m_1 exited with c

I googled it, and it seemed this error caused by octave. DefDataPackages.m done its job properly(I guessed), but octave spit the error after execution of DefDataPackages. I wonder that is there anybody experience same error I've got and how can I solve this problem. thanks.

MatthewBM commented 3 years ago

Hi @jwgim461

I've had this same error with Octave in Docker as well. Fortunately the error for me only throws some of the time, and other times it passes, so I amended runprediction.sh to do the following:

DefDataPackages.m "$images" "$augimages"

change to:

while true; do
DefDataPackages.m "$images" "$augimages" || test $? -eq 1 && break
sleep 1
done

it's a quick fix for docker images, let me know if that works for you or if it still hangs

jawonGim commented 3 years ago

@MatthewBM thanks for the reply. unfortunately, the error presented every time in my case. so, your solution doesn't work for me. do you have any clue about this error? I don't have any. I never used octave before.

MatthewBM commented 3 years ago

Try updating Octave in the Dockerfile like what is done here: https://github.com/affinelayer/pix2pix-tensorflow/issues/95

jawonGim commented 3 years ago

I tried all the method in the link you gave me. but not successful (octave version is still 4.0.0 or build failed). so I modified Dockerfile myself.

RUN apt-get install -y software-properties-common RUN add-apt-repository ppa:octave/stable RUN apt-get install -y --no-install-recommends octave-image octave-pkg-dev RUN apt-get update -y RUN apt-get install -y --no-install-recommends octave=4.2.2-1~octave~xenial2 RUN rm -rf /var/lib/apt/lists/*

RUN mkdir -p /home/nd_sense/HDF5/HDF5-oct RUN git clone https://github.com/stegro/hdf5oct/ /home/nd_sense/HDF5/HDF5-oct/. WORKDIR /home/nd_sense/HDF5/HDF5-oct/ RUN git checkout 62098f7e82eab88059f7d104895f7bfb84d37850

apt install order seems weird, but I can't install octave-pkg-dev after apt update because of some dependency problem. and I had to changed revision number of hdf5oct git. I know, I am far away from your standard install guideline, but right now, octave doesn't spit the error message what I reported.

instead, new error has occurred. logs are here.

$ tail -f logs/*.log ==> logs/postprocess.log <==

Running Postprocess

Trained Model Dir: /data/trainout/roi8_2nd/ Image Dir: /data/images/roi9/ Models: 1fm,3fm,5fm Speed: 1

For model 1fm postprocessing Pkg001_Z01 1 of 1 Waiting for /data/predictout/my_25k/roi9/1fm/Pkg001_Z01 to finish processing

==> logs/prediction.log <== Running Prediction

Trained Model Dir: /data/trainout/roi8_2nd/ Image Dir: /data/images/roi9/ Models: 1fm,3fm,5fm Speed: 1 GPU: all

For model 1fm preprocessing Pkg001_Z01 1 of 1 Running prediction on 1fm Pkg001_Z01

==> logs/preprocess.log <== Running PreprocessPackage

Trained Model Dir: /data/trainout/roi8_2nd/ Image Dir: /data/images/roi9/ Models: 1fm,3fm,5fm Speed: 1

Preprocessing Pkg001_Z01 in model 1fm Waiting for prediction to catch up Preprocessing Pkg001_Z01 in model 3fm

==> logs/prediction.log <== Detected 4 GPU(s). Will run in parallel. ERROR non-zero exit code (4) from running predict_seg_new.bin Command exited with non-zero status 6 real 3.36 user 0.07 sys 2.74 ERROR, a non-zero exit code (6) was received from: caffepredict.sh

==> logs/postprocess.log <== KILL.REQUEST file found. Exitin

should I go back to previous build and fix the octave error with different way?

haberlmatt commented 3 years ago

Hi @jwgim461 the easiest solution would be to use the newer version of CDeep3M (in which we don't use octave any more): Code and distributions are here: https://github.com/CRBS/cdeep3m2

In principle the performance and accuracy should be equal or better than the previous version. If you have trained models that were specifically trained with CDeep3M version 1.x.x and you don't want to retrain them, you can emulate the CDeep3M2 to run like version 1.x.x (by turning off denoising like this: runprediction.sh --denoise 0 trainoutdir imagesdir predictoutdir)

You can find the docker container here: https://hub.docker.com/r/ncmir/cdeep3m

jawonGim commented 3 years ago

wow. I didn't noticed CDeep3M2 is out. I will try new version.
thanks.