ZhiningLiu1998 / mesa

[NeurIPS’20] ⚖️ Build powerful ensemble class-imbalanced learning models via meta-knowledge-powered resampler. | 设计元知识驱动的采样器解决类别不平衡问题
https://arxiv.org/abs/2010.08830
MIT License
105 stars 25 forks source link

Issue running model #4

Closed fipeop closed 2 years ago

fipeop commented 3 years ago

Hi,

Thanks for the great work. I tried installing the dependencies as in explained in the last version of the ReadMe file and I got:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [50, 1]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Alternatively, when installing the same version of pytorch 1.0.0 with GPU support I got this different issue: https://discuss.pytorch.org/t/undefined-symbol-cblas-sgemm-alloc/32497

Is there any other way to build the dependencies?

ZhiningLiu1998 commented 3 years ago

Hi FELIPE,

Thanks for your interest in our work~

But I'm sorry that I cannot reproduce your error. Just to confirm, did you follow the commands in the requirements section for the install? i.e., this part:

NOTE: this implementation requires an old version of PyTorch (v1.0.0). You may want to start a new conda environment to run our code. The step-by-step guide is as follows (using torch-cpu for an example):

  • conda create --name mesa python=3.7.11
  • conda activate mesa
  • conda install pytorch-cpu==1.0.0 torchvision-cpu==0.2.1 cpuonly -c pytorch
  • pip install -r requirements.txt

These commands should help you to get ready for running mesa. If you have any further questions, please feel free to open an issue or drop me an email.

I just did a fresh install with these commands and the code seems to work as expected. Please try following this guide and see if it solves your problem.

image

PS: The meta-sampler used in MESA is not a large network. Its size depends only on the dimensionality of the meta-state (usually < 20), rather than the amount of data. So the advantage of pytorch-GPU is likely to be insignificant.

fipeop commented 3 years ago

Yes, I followed those commands --- did you test on a Unix environment? Wondering if those steps only work on Windows, maybe?

ZhiningLiu1998 commented 3 years ago

Yes, my primary development environment is Windows, so I only tested these steps on it. Maybe there are some magic inconsistencies between anaconda on windows and Unix-based OSs that caused this problem?

But I'm sorry that I'm currently busy applying for a Ph.D., so I'm afraid that there is no time for me to fix this. You can see that the meta_sampler is just a Soft Actor-Critic network (see here), which is defined in this folder. The main classes are defined in sac.py and model.py, with only less than 300 lines of code in total. I believe the problem is most likely in these 300 lines of code in SAC. In the future, I may replace the SAC implementation based on a more modern version of Pytorch.

If you find a solution to this error, I would greatly appreciate a PR!

Thanks again for your interest~