Open No2Ross opened 10 months ago
Hi Ross
sorry I don't understand the issue. What happens when you run the following?
st_model.train(max_epochs=30000, accelerator=accelerator)
Does the training run for one step?
Default st_model
training sees all data in each step so epochs and steps are equal.
Hi There,
Sorry about that, i copied the wrong section of code. The relevant bits say that: Trainer.fit
stopped: max_epochs=100
reached.
and
Trainer.fit
stopped: max_steps=1
reached.
No matter what you set the max_epochs values to, the number of max_steps stays the same, i.e. 1. The output is that the trainer ends after 1 step is reached as shown above.
Thanks, Ross
Is this still an issue @No2Ross?
Hi there,
It is still an issue in that it still only reaches a max step of 1, however when you look at the training output it seems like the model has achieved a good fit.
I'm still a bit worried that there could be a better fit, but uncertain whether that's true or not.
Thanks, Ross
Did you create a separate isolated conda environment?
I do not see this issue with my step - which could indicate package version mismatch issues.
I've tried it on two different conda environments and got the same result. I'm guessin you're right and it's just some package has an incorrect version, i'll try playing about with them and see if I can get it fixed.
I've been trying to run cell2location on my datasets and on your test example, and each time i've noticed that during trainer phase, training stops after max_step=1 has been reached. I've provided code from my run on your example data where i set max steps to be 30,000 and still it says the max_step=1 has been reached.
cell2location 0.1.4 torch 2.1.2+cu118
Thanks Ross