ContinualAI / avalanche

Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
http://avalanche.continualai.org
MIT License
1.74k stars 285 forks source link

Tests not Passing Anymore with a Strange Behavior #227

Closed vlomonaco closed 3 years ago

vlomonaco commented 3 years ago

It seems the unittest is not passing anymore, even though its not clear why (they work in local). @ggraffieti can you take a look into this issue?

ggraffieti commented 3 years ago

I'll take a look at that.

ggraffieti commented 3 years ago

I checked the memory consumption and seemed pretty good, so the error shouldn't be triggered by low memory. Moreover, all the tests completed successfully, the segfault appears after the conclusions of tests. Don't know if it's a problem of the github action itself or a problem of our code, but there is something strange happening here:

IMHO there is something broken in the new evaluation plugin, so we should have a closer look at the code and try to debug these strange errors. I'll try to re-run the tests with the github actions with the code in the state at the commit before the merging of the pull request #222. If before the merge the tests work I think the problem is in our code, otherwise I'll try to understand why the action suddenly stopped working.

vlomonaco commented 3 years ago

I'll try to re-run the tests with the github actions with the code in the state at the commit before the merging of the pull request #222.

Yes, this test will be fundamental, let us know how it goes!

vlomonaco commented 3 years ago

It seems issue #225 was related to a specific pythorch version < 1.7.0. Just to make sure they are note related... I'd check if the server creates an env with an higher version as well.

lrzpellegrini commented 3 years ago

Hi, I just fixed the problem with TinyImagenet logging (TypeError: not all arguments converted during string formatting) in the new pull request #234 but that problem doesn't seem to be related to this issue. Tests still fail.

And by "fail" I mean that for some unknown reason, after successfully passing all unit tests (unittest also prints "OK"), I get a segmentation fault. I can't reproduce this issue on Ubuntu 20 LTS (both standalone and WSL 2). It only seems to happen on GitHub actions and Travis (.com).

lrzpellegrini commented 3 years ago

Could it be related to the fact that GitHub action is using Ubuntu 18.04 while Travis is using Ubuntu 16.04?

ggraffieti commented 3 years ago

Fixed downgrading python to 3.6

vlomonaco commented 3 years ago

Thanks to @ggraffieti we finally fixed the issue here: https://github.com/vlomonaco/avalanche/commit/202fc58405bd9848cb2fa8fda2d1c14f99ebe8cd

We didn't fully understand why but it seems that the conda env with py3.9 was generating the issue. Now we test with py3.6 (our environment-dev.yml) and all goes well.