Closed RuiFilipeCampos closed 8 months ago
i've increased the available memory in the redis process
all these three processes should actually have the same amount of memory available to them since they all hold the whole slice at some point
seems to be performing quite well, I'm just gonna wait and see if it breaks again, hopefully mem increase was enough
i changed the result backend to sqlite
not too happy about it since I wanted to avoid using the disk
will look into this redis issue, but it's not critical at the moment as long as the sqlite result backend works as intended
didn't work, tomorrow I'll setup the redis.conf file to handle the buffer limit thing
the model is converging quite well tho
this is train loss but they're all unseen batches, so it still indicates that it is generalizing, the amount of steps in an epoch goes up to about 12.5k (there's 3M data points in the train split)
https://github.com/redis/redis/issues/11621
It seems that the client is not emptying db2 fast enough.
Seems like an open issue in celery
https://github.com/celery/celery/issues/4983#issuecomment-518302708
this has been resolved
it was also a symptom of a deeper issue
might need to patch this by passing files via shared volume
will see if that can be avoided