LPZliu commented 9 months ago

运行了模型下载了数据跑了三百论 loss一直是1.3 精度也不高只有0.6 左右这种怎么调节去解决改变了学习率那个loss也降不下

Peachypie98 commented 9 months ago

Thank your for brining the issue. I would gladly appreciate if you typed it in English since i can't read Chinese. When I trained the model, at epoch 200, the loss decreases until 0.37 and validation accuracies are as follows:

val psnr: 42.004
val crop acc: 0.989
val scale acc: 0.981
val mjpeg acc: 0.993

At the moment, I'm afraid that I can't help address the problem since it needs further investigation. Therefore, If you prefer, provide me your email address, so I can send you the pretrained model that has been trained with 32-bits of data.

LPZliu commented 9 months ago

thanks my email: kevin_ailover@163.com

感谢您解决这个问题。如果您用英文输入，我将不胜感激，因为我看不懂中文。当我训练模型时，在 epoch 200 时，损失减少到 0.37，验证精度如下：

值PSNR：42.004

Val Crop 累积： 0.989

Val Scale Acc： 0.981

VAL MJPEG 累积： 0.993

目前，恐怕我无法帮助解决这个问题，因为它需要进一步调查。因此，如果您愿意，请向我提供您的电子邮件地址，以便我可以向您发送已使用 32 位数据训练的预训练模型

thanks my email: kevin_ailover@163.com

LPZliu commented 9 months ago

感谢您解决这个问题。如果您用英文输入，我将不胜感激，因为我看不懂中文。当我训练模型时，在 epoch 200 时，损失减少到 0.37，验证精度如下：

值PSNR：42.004

Val Crop 累积： 0.989

Val Scale Acc： 0.981

VAL MJPEG 累积： 0.993

目前，恐怕我无法帮助解决这个问题，因为它需要进一步调查。因此，如果您愿意，请向我提供您的电子邮件地址，以便我可以向您发送已使用 32 位数据训练的预训练模型

this is my question.I do not konw how to address it. can you help me?

Peachypie98 commented 9 months ago

Can you provide more details about the training parameters, such as data_dim, seq_len, batch_size, learning rate, etc.

LPZliu commented 9 months ago

Can you provide more details about the training parameters, such as data_dim, seq_len, batch_size, learning rate, etc.

Peachypie98 commented 9 months ago

Could you adjust the batch size to 12 and retrain the model? Considering the learning rate, setting the batch size high could interfere with the model's convergence. If you want to set the batch size to 64, try increase the learning rate.

Note: I used batch size of 12 with learning rate of 0.0005 during my training

LPZliu commented 9 months ago

In fact, I've done it, not only modified bathsize but also modified lr, but I didnot address this problem.

在 2024-01-15 12:22:32，"Jae Kyu Im" @.***> 写道：

Could you adjust the batch size to 12 and retrain the model? Considering the learning rate, setting the batch size high could interfere with the model's convergence.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Peachypie98 commented 9 months ago

The quick solution I can provide is by using this official code below. Try change the current make_pair function with this one.

def make_pair(frames, data_dim, use_bit_inverse=True, multiplicity=1):
    # Add multiplicity to further stabilize training.
    frames = torch.cat([frames] * multiplicity, dim=0).cuda()
    data = torch.zeros((frames.size(0), data_dim)).random_(0, 2).cuda()

    # Add the bit-inverse to stabilize training.
    if use_bit_inverse:
        frames = torch.cat([frames, frames], dim=0).cuda()
        data = torch.cat([data, 1.0 - data], dim=0).cuda()

    return frames, data

The reason why I modified this function is because when the hidden data is 32 bits with all 0 or 1, the model couldn't retreived the data that well(=low accuracy). So, I let the model trained the data with bits of all 0 and 1 to increase the accuracy.

LPZliu commented 9 months ago

Thanks for the reply, can you please give me all the files generated by your training。 et metrics.tsv

At 2024-01-15 12:32:05, "Jae Kyu Im" @.***> wrote:

The quick solution I can provide is by using this official code below. Try change the current make_pair function with this one.

defmake_pair(frames, data_dim, use_bit_inverse=True, multiplicity=1):

Add multiplicity to further stabilize training.frames=torch.cat([frames] *multiplicity, dim=0).cuda()

data=torch.zeros((frames.size(0), data_dim)).random_(0, 2).cuda()

# Add the bit-inverse to stabilize training.ifuse_bit_inverse:
    frames=torch.cat([frames, frames], dim=0).cuda()
    data=torch.cat([data, 1.0-data], dim=0).cuda()

returnframes, data

The reason why I modified this function is because when the hidden data is 32 bits with all 0 or 1, the model couldn't retreived the data well(=low accuracy). So, I let the model trained the data with bits of all 0 and 1 to increase the accuracy.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Peachypie98 commented 9 months ago

Sure, I will send it via email.

Peachypie98 commented 9 months ago

These are the files generated during training.

On Mon, Jan 15, 2024 at 3:19 PM LPZliu @.***> wrote:

Thanks for the reply, can you please give me all the files generated by your training。 et metrics.tsv

At 2024-01-15 12:32:05, "Jae Kyu Im" @.***> wrote:

The quick solution I can provide is by using this official code below. Try change the current make_pair function with this one.

defmake_pair(frames, data_dim, use_bit_inverse=True, multiplicity=1):

Add multiplicity to further stabilize training.frames=torch.cat([frames]

*multiplicity, dim=0).cuda() data=torch.zeros((frames.size(0), datadim)).random(0, 2).cuda()

Add the bit-inverse to stabilize training.ifuse_bit_inverse:

frames=torch.cat([frames, frames], dim=0).cuda() data=torch.cat([data, 1.0-data], dim=0).cuda()

returnframes, data

The reason why I modified this function is because when the hidden data is 32 bits with all 0 or 1, the model couldn't retreived the data well(=low accuracy). So, I let the model trained the data with bits of all 0 and 1 to increase the accuracy.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/Peachypie98/RivaGAN/issues/2#issuecomment-1891373902, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASNPWKEODBM7H3EJIAU5G7LYOTC5PAVCNFSM6AAAAABB2UCLO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGM3TGOJQGI . You are receiving this because you commented.Message ID: @.***>

LPZliu commented 9 months ago

Sorry, I didn't see the attached file. Can you send me a new copy, thanks

在 2024-01-15 15:57:56，"Jae Kyu Im" @.***> 写道：

These are the files generated during training.

On Mon, Jan 15, 2024 at 3:19 PM LPZliu @.***> wrote:

Thanks for the reply, can you please give me all the files generated by your training。 et metrics.tsv

At 2024-01-15 12:32:05, "Jae Kyu Im" @.***> wrote:

The quick solution I can provide is by using this official code below. Try change the current make_pair function with this one.

defmake_pair(frames, data_dim, use_bit_inverse=True, multiplicity=1):

Add multiplicity to further stabilize training.frames=torch.cat([frames]

*multiplicity, dim=0).cuda() data=torch.zeros((frames.size(0), datadim)).random(0, 2).cuda()

Add the bit-inverse to stabilize training.ifuse_bit_inverse:

frames=torch.cat([frames, frames], dim=0).cuda() data=torch.cat([data, 1.0-data], dim=0).cuda()

returnframes, data

The reason why I modified this function is because when the hidden data is 32 bits with all 0 or 1, the model couldn't retreived the data well(=low accuracy). So, I let the model trained the data with bits of all 0 and 1 to increase the accuracy.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/Peachypie98/RivaGAN/issues/2#issuecomment-1891373902, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASNPWKEODBM7H3EJIAU5G7LYOTC5PAVCNFSM6AAAAABB2UCLO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGM3TGOJQGI . You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

LPZliu commented 9 months ago

Thanks for your help, I've finished solving this problem. But I still want to discuss some details with you.

Peachypie98 commented 9 months ago

Since the problem has been solved, I will close this issue.

Peachypie98 / RivaGAN

loss一直不收敛 #2

Add multiplicity to further stabilize training.frames=torch.cat([frames] *multiplicity, dim=0).cuda()

Add multiplicity to further stabilize training.frames=torch.cat([frames]

Add the bit-inverse to stabilize training.ifuse_bit_inverse:

Add multiplicity to further stabilize training.frames=torch.cat([frames]

Add the bit-inverse to stabilize training.ifuse_bit_inverse: