Fixed reported issues in my forked version.

Victor4869 commented 1 year ago

I have fixed the issues reported here. Also included tutorial on how to use the scripts if you are unfamiliar with MXNet, you can find the tutorial in the Wiki.

Link: https://github.com/Victor4869/open-alcnet

There are two branches available, the master branch is mostly original with bug fixes. The dev branch includes two datasets and has additional features available, you can find more detail in the description.

YangBo0411 commented 1 year ago

您好，使用了您修复程序，但是在下面这两个参数的选择上，怎么选都会报错，您遇到过类似的问题吗？

Victor4869 commented 1 year ago

你用 multiple 和 bottomuplocal 试试，直接在default那改

YangBo0411 commented 1 year ago

1

YangBo0411 commented 1 year ago

你用 multiple 和 bottomuplocal 试试，直接在default那改

非常感谢您的回复，在做了相应修改之后，还是出现了问题。和选择其他--scale-mode、--pyramid-fuse参数出现的问题一样，训练过程好像没有问题，在验证的时候出现了问题，您遇到过类似的问题吗？ 0%| | 0/160 [00:00<?, ?it/s][09:43:37] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\nn\cudnn./cudnn_algoreg-inl.h:97: Running performance tests to find the best convolution algorithm, this can take a while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) Epoch 0, training loss 1.0000: 100%|██████████| 160/160 [00:15<00:00, 10.37it/s] 0%| | 0/64 [00:01<?, ?it/s] Traceback (most recent call last): File "E:\model contrast\open-alcnet-dev\open-alcnet-dev\train_alcnet.py", line 608, in trainer.validation(epoch) File "E:\model contrast\open-alcnet-dev\open-alcnet-dev\train_alcnet.py", line 476, in validation pred = self.net(x) File "E:\software\anaconda\envs\cuda100\lib\site-packages\mxnet\gluon\block.py", line 548, in call out = self.forward(args) File "E:\software\anaconda\envs\cuda100\lib\site-packages\mxnet\gluon\block.py", line 925, in forward return self.hybrid_forward(ndarray, x, args, **params) File "E:\model contrast\open-alcnet-dev\open-alcnet-dev\model\contrast.py", line 539, in hybrid_forward c3pcm = self.cal_mpcm(c3) # sub 8, 64 File "E:\model contrast\open-alcnet-dev\open-alcnet-dev\model\contrast.py", line 681, in cal_mpcm pcm13 = cal_pcm(cen, shift=13) File "E:\model contrast\open-alcnet-dev\open-alcnet-dev\model\contrast.py", line 89, in cal_pcm B1, B2, B3, B4, B5, B6, B7, B8 = circ_shift(cen, shift=shift) File "E:\model contrast\open-alcnet-dev\open-alcnet-dev\model\contrast.py", line 17, in circ_shift B1_NW = cen[:, :, shift:, shift:] # B1_NW is cen's SE File "E:\software\anaconda\envs\cuda100\lib\site-packages\mxnet\ndarray\ndarray.py", line 511, in getitem return self._get_nd_basic_indexing(key) File "E:\software\anaconda\envs\cuda100\lib\site-packages\mxnet\ndarray\ndarray.py", line 823, in _get_nd_basic_indexing sliced_nd = op.slice(self, begin, end, step) File "", line 86, in slice File "E:\software\anaconda\envs\cuda100\lib\site-packages\mxnet_ctypes\ndarray.py", line 92, in _imperative_invoke ctypes.byref(out_stypes))) File "E:\software\anaconda\envs\cuda100\lib\site-packages\mxnet\base.py", line 253, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [09:43:53] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\tensor./matrix_op-inl.h:688: Check failed: b < len (13 vs. 4) : slicing with begin[2]=13 exceeds limit of input dimension[2]=4

Victor4869 commented 1 year ago

那就应该不是参数问题了，你这个问题我也遇到过，但是忘记当时是怎么解决的了。

Epoch 0, training loss 1.0000: 100%|

看到这个我怀疑是不是环境没弄好，你是用windows跑吗？我用windows遇到很多奇奇怪怪的问题，最后是在linux下跑的。

你试试用我的 dev branch 在 Colab里跑一下，配置和运行步骤我在wiki里面写了

YangBo0411 commented 1 year ago

那就应该不是参数问题了，你这个问题我也遇到过，但是忘记当时是怎么解决的了。

Epoch 0, training loss 1.0000: 100%|

看到这个我怀疑是不是环境没弄好，你是用windows跑吗？我用windows遇到很多奇奇怪怪的问题，最后是在linux下跑的。

你试试用我的 dev branch 在 Colab里跑一下，配置和运行步骤我在wiki里面写了

对的，我就是在window下面跑的，我试一下您说的方法，不行的话我也用linux吧，感谢您的解答

YangBo0411 commented 1 year ago

那就应该不是参数问题了，你这个问题我也遇到过，但是忘记当时是怎么解决的了。

Epoch 0, training loss 1.0000: 100%|

看到这个我怀疑是不是环境没弄好，你是用windows跑吗？我用windows遇到很多奇奇怪怪的问题，最后是在linux下跑的。

你试试用我的 dev branch 在 Colab里跑一下，配置和运行步骤我在wiki里面写了

您好，按照您的步骤在colab上跑了，前面一切正常，但是在进行训练的时候报错了。您遇到过这种情况吗？

Victor4869 commented 1 year ago

这个倒是没遇到过，看着是没找到文件的报错，你有把文件先传到 Google Drive 再运行吗？把dev branch的文件全部传上去要挺久的因为图片很多，要慢慢等。另外那个mount drive 和 colab-path 的文件路径可能都要根据你实际的路径改一下。还有就是我之前跑的时候CUDA是11.6现在升级到12了，不过刚跑了下会出个警告但是也能正常运行，不确定会不会影响训练结果。

YangBo0411 commented 1 year ago

这个倒是没遇到过，看着是没找到文件的报错，你有把文件先传到 Google Drive 再运行吗？把dev branch的文件全部传上去要挺久的因为图片很多，要慢慢等。另外那个mount drive 和 colab-path 的文件路径可能都要根据你实际的路径改一下。还有就是我之前跑的时候CUDA是11.6现在升级到12了，不过刚跑了下会出个警告但是也能正常运行，不确定会不会影响训练结果。

这个倒是没遇到过，看着是没找到文件的报错，你有把文件先传到 Google Drive 再运行吗？把dev branch的文件全部传上去要挺久的因为图片很多，要慢慢等。另外那个mount drive 和 colab-path 的文件路径可能都要根据你实际的路径改一下。还有就是我之前跑的时候CUDA是11.6现在升级到12了，不过刚跑了下会出个警告但是也能正常运行，不确定会不会影响训练结果。

感谢您的回复，最后跟着您的教程，终于在Linux下跑通了，十分感谢！！！

YangBo0411 commented 1 year ago

您好，我现在在进行可视化程序的时候出现了bug，您知道怎么解决吗？看了您的主页，您好像对戴博士的另一篇ACM论文也进行了修复，您知道另一个程序如何对结果进行可视化吗？期待您的解惑！

Victor4869 commented 1 year ago

resume要填训练好的模型路径，这样就不会遇到这个问题了，这个可视化其实就是把预测的图片都保存下来了。 ACM那个论文和程序我之前也粗略看了下，但是那个我没改过，不过有不少文件是和 ALC 一样的。我估计他是先写了ACM的程序，然后在这基础上又加了ALC的部分。

YangBo0411 commented 1 year ago

十分感谢！！按照您的答复问题已经解决了。

YimianDai / open-alcnet

Fixed reported issues in my forked version. #18