Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.
GNU Affero General Public License v3.0
1.82k stars 154 forks source link

[Bug]: Flux masked training - tensor size issue #597

Open master131 opened 5 hours ago

master131 commented 5 hours ago

What happened?

It appears that somewhere between d7a4e73 and 13dbd21, masked training for Flux has stopped working when it tries to perform training steps (after initial sampling). Reverting back to the earlier mentioned commit works fine (last commit I was using prior to pulling from master).

I am using masked training with a probability and weight of zero.

What did you expect would happen?

Training works as per normal.

Relevant log output

Traceback (most recent call last):
  File "C:\OneTrainer\modules\ui\TrainUI.py", line 561, in __training_thread_function
    trainer.train()
  File "C:\OneTrainer\modules\trainer\GenericTrainer.py", line 676, in train
    loss = self.model_setup.calculate_loss(self.model, batch, model_output_data, self.config)
  File "C:\OneTrainer\modules\modelSetup\BaseFluxSetup.py", line 593, in calculate_loss
    return self._flow_matching_losses(
  File "C:\OneTrainer\modules\modelSetup\mixin\ModelSetupDiffusionLossMixin.py", line 311, in _flow_matching_losses
    losses = self.__masked_losses(batch, data, config)
  File "C:\OneTrainer\modules\modelSetup\mixin\ModelSetupDiffusionLossMixin.py", line 79, in __masked_losses
    losses += masked_losses(
  File "C:\OneTrainer\modules\util\loss\masked_loss.py", line 13, in masked_losses
    losses *= clamped_mask
RuntimeError: The size of tensor a (16) must match the size of tensor b (64) at non-singleton dimension 1

Output of pip freeze

No response

O-J1 commented 4 hours ago

Upload/Attach config and sample comparisons demonstrating now vs prior.