RLPR / LabelReviews

Reproducible Label Reviews
https://rlpr.github.io
4 stars 0 forks source link

S17 - Crack Detection as a Weakly-Supervised Problem: Towards Achieving Less Annotation-Intensive Crack Detectors #17

Open yinoue0426 opened 3 years ago

yinoue0426 commented 3 years ago

General info

Reviewer feedback

Details Results

akrah commented 3 years ago

Hi @aiueosamu,

Your README.md file is really complete with a lot of informations. However, it is not clear what is the set of instructions to execute in order to obtain the results. I think understand that some of the proposed commands call some others but this is not clear without to read the content of the script files.

I suggest to options:

akrah commented 3 years ago

Moreover, could you detail the packages to install on a fresh Ubuntu 18.4 to use Python 3.6.9 and CUDA 9 ?

yinoue0426 commented 3 years ago

Thanks for the feedback. I added the shortened version of the README.

As for installing CUDA on a fresh Ubuntu, I've based my docker image on this repository. As Dockerfile is nothing but a sequence of installation commands, I think you can just follow the commands listed here to install CUDA and Python on a Ubuntu18.04.

Cyril-Meyer commented 3 years ago

Hi @aiueosamu,

I followed the instructions in the tldr.md file, but I got stuck while training DeepCrack. I tested the followig commands on two different computer with the provided docker images.

Here are the different steps that I followed :

Fork, clone and build docker

fork : https://github.com/hitachi-rd-cv/weakly-sup-crackdet git clone https://github.com/Cyril-Meyer/weakly-sup-crackdet git clone https://github.com/tobycheese/9.0-cudnn7-devel-ubuntu18.04 in 9.0-cudnn7-devel-ubuntu18.04 folder : sudo docker build -t cuda9_ubuntu1804 . in weakly-sup-crackdet/docker folder : sudo docker build -t weakly-sup-crackdet .

I got error building weakly-sup-crackdet docker, I made the following changes :

RUN pip3 install scikit-image==0.15.0 pyyaml cython opencv-python==4.1.0.25 futures==3.2.0

ERROR: Could not find a version that satisfies the requirement futures==3.2.0.

->

RUN pip3 install scikit-image==0.15.0 pyyaml cython opencv-python==4.1.0.25 futures==3.1.1
RUN apt install python3-tk

->

RUN apt install -y python3-tk
RUN pip uninstall opencv-python opencv-python-headless opencv-contrib-python

->

RUN pip uninstall -y opencv-python opencv-python-headless opencv-contrib-python

sudo docker run -it --gpus all --mount type=bind,source="$(pwd)",target=/working_dir weakly-sup-crackdet

The following error :

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled

Got a solution here : https://github.com/NVIDIA/nvidia-docker/issues/1186

Preliminaries

Training DeepCrack

I got this error 3 times

dataset [DeepCrackDataset] was created
The number of training images = 60
initialize network with xavier
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.
  warnings.warn(warning.format(ret))
model [DeepCrackModel] was created
---------- Networks initialized -------------
[Network G] Total number of parameters : 14.720 M
-----------------------------------------------
create web directory ./checkpoints/aigle_deepcrack_dil1/web...
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
Traceback (most recent call last):
  File "train.py", line 32, in <module>
    model.optimize_parameters(epoch)   # calculate loss functions, get gradients, update network weights
  File "/working_dir/weakly-sup-crackdet/models/deepcrack/models/deepcrack_model.py", line 111, in optimize_parameters
    self.forward()      # compute predictions.
  File "/working_dir/weakly-sup-crackdet/models/deepcrack/models/deepcrack_model.py", line 74, in forward
    self.outputs = self.netG(self.image)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/working_dir/weakly-sup-crackdet/models/deepcrack/models/deepcrack_networks.py", line 58, in forward
    conv1 = self.conv1(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:663
----------------- Options ---------------

Since the model does not train, the evaluation also crashes. This error seems to be related to the installation of PyTorch, but since it is specified precisely in the dockerfile, I find myself stuck. For Training DeepLab V3+, I have another problem, the "scripts/train.sh" file does not exist.

Do you have any idea how to fix this problem?

yinoue0426 commented 3 years ago

@Cyril-Meyer Thank you for the reply. I've updated the dockerfile accordingly. And also thanks for the nvidia-docker info. Seems more like a nvidia-docker issue, but it helps anyways.

As for the /pytorch/aten/src/THC/THCGeneral.cpp:663 issue, it seems that it is caused by the fact that GPU with Turing architecture are not compatible with CUDA9.0 ref, and both setups you mention use GPUs built on Turing architecture. Sorry I was not aware of this point, I have updated the README accordingly.

CUDA9 criteria is imposed by the fact that the DeepCrack repo requires PyTorch 0.4.1 which requires CUDA9, and I cannot do much. I think DeepLab code ran with tensorflow 1.13.1 which uses CUDA10, so maybe you can test for DeepLab with CUDA10.

As for DeepLab, I forgot to add the flags:

./tools/setup_models.sh --deepcrack --deeplab

This should correctly populate the scripts stored in tools/model_supp and you should be able to find the scripts/train.sh file now.

Cyril-Meyer commented 3 years ago

Hi @aiueosamu, thank you for the update and precision.

I followed the new instructions in the tldr.md file, but there are still problems. I tested the followig commands on a third computer with the provided docker images.

Preliminaries

OK

Training DeepCrack

The evaluation process failed on the three different models.

FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/aigle_deepcrack_dil1/web/images'
FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/cfd_deepcrack_dil1/web/images'
FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/dc_deepcrack_dil1/test_latest/images'

Training DeepLab V3+

Concerning the missing flags ./tools/setup_models.sh --deepcrack --deeplab, this is not the source of the problem, the setup_models.sh seems not to use them, and call the python script with both flags in any case.

python3 tools/setup_models.py --deepcrack --deeplab

The setup_models.py script copy the files from "tools/model_supp/deeplab" to "models/deeplab/research/deeplab/". A "scripts folder" is located in "models/deeplab/research/deeplab".

I try two options :

None of these options worked, here is the error :

ModuleNotFoundError: No module named 'deeplab'
yinoue0426 commented 3 years ago

@Cyril-Meyer Sorry for asking you to do multiple trials and late response, but I think we are close. I've followed tldr.md myself and was able to reproduce your error, I apologize. I've updated the repo with as new commit, and hopefully the following will fix the problems.

First, as for DeepCrack, the error you mentioned (posted below for reference) is raised by a cleanup code in scripts/output_format.py. Please comment out the last line (line 63) in scripts/output_format.py (In the new commit, this line is wrapped with a try-catch).

FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/dc_deepcrack_dil1/test_latest/images'

In addition, there was an error in models/deepcrack/scripts/test_eval.sh. Line 16 is

    scripts/test_deepcrack.sh 0 $MODEL ./datasets/deeprack_detailed ./checkpoints/

but it should really be

    scripts/test_deepcrack.sh 0 $MODEL ./datasets/deepcrack_detailed ./checkpoints/

After the above changes, I was able to train the DeepCrack model without any problems. However, I was not able to reproduce the results for the Aigle dataset. It turns out that there are some bugs in the DeepCrack repo which prevented it from producing the correct results for Aigle dataset. From the new commit, please replace the following files under models/deepcrack/ with files under tools/model_supp/deepcrack/:

With this fix, you should be able to run the script without any problems. If it runs correctly, you should be able to see the output images stored under checkpoints/*_deepcrack_*/sample_imgs/test_output

As for DeepLab, there were two problems. I've updated the Training DeepLab V3+ section of the tldr.md file for reference.

First, the main directory of DeepLab files is actually under models/deeplab/research/deeplab, not models/deeplab, so run the training scripts from there.

Second, the google repo requires the PYTHONPATH enrironment variable to be set correctly. This was causing the ModuleNotFoundError. Run the following lines to resolve the issue:

cd models/deeplab/research
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

I've also noticed that python eval/micro_eval.py does not run properly due to some images being in jpg format. Please update eval/utils.py and eval/feature_extractor.py.

With these changes, I think you should be able to run the code without problems.

Cyril-Meyer commented 3 years ago

@aiueosamu

I restarted everything from scratch, but the ./tools/download.sh script no longer works, it fails while processing the DCD dataset.

Here is the returned error :

Aigle
CFD
DCD
Traceback (most recent call last):
  File "tools/download.py", line 129, in <module>
    processDCD()
  File "tools/download.py", line 98, in processDCD
    f_img_dname, t_img_dname, cv2.imread, prefix=pre, extension='.jpg')
  File "tools/download.py", line 16, in populate
    for f_fname in os.listdir(from_dname):
FileNotFoundError: [Errno 2] No such file or directory: 'data/deepcrack_github/dataset/test_img'

This is probably due to recent modifications to the script (e.g. an unzip command is no longer executed).

yinoue0426 commented 3 years ago

@Cyril-Meyer I am really sorry about that, you are right. Please uncomment the unzip line from the tools/download.sh script (line 16, I believe).