I am currently exploring the use of DiT model for diffusion/flow matching and have been reviewing both the LFM and DiT repositories. I have noticed a potential discrepancy in how timesteps are handled between these two implementations, specifically regarding the range of timesteps and their scaling in the timestep_embedding function.
Given the variations in timestep scaling, the resulting embeddings from each model will inherently differ, potentially affecting their performance and behavior.
Could you please clarify if this difference in the range of timesteps is intentional?
Thank you for your help and for the great work on these projects!
Hello,
I am currently exploring the use of DiT model for diffusion/flow matching and have been reviewing both the LFM and DiT repositories. I have noticed a potential discrepancy in how timesteps are handled between these two implementations, specifically regarding the range of timesteps and their scaling in the
timestep_embedding
function.In the DiT repository, timesteps are scaled from 0 to 1000: https://github.com/facebookresearch/DiT/blob/ed81ce2229091fd4ecc9a223645f95cf379d582b/train.py#L204
However, in the LFM repository, timesteps appear to be taken from a normalized range of 0 to 1: https://github.com/VinAIResearch/LFM/blob/601fd91f9e7a9f8e4cc178f3d6c77ea0de4ff0b9/train_flow_latent.py#L145
But both repositories use a same
timestep_embedding
function which does not inherently account for different ranges of input timesteps: DiT: https://github.com/facebookresearch/DiT/blob/ed81ce2229091fd4ecc9a223645f95cf379d582b/models.py#L41 LFM: https://github.com/VinAIResearch/LFM/blob/601fd91f9e7a9f8e4cc178f3d6c77ea0de4ff0b9/models/DiT.py#L44Given the variations in timestep scaling, the resulting embeddings from each model will inherently differ, potentially affecting their performance and behavior.
Could you please clarify if this difference in the range of timesteps is intentional?
Thank you for your help and for the great work on these projects!
Best regards, Danil.