calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
409 stars 126 forks source link

Problem with multigenome training with > 2 genomes #158

Open raphaelmourad opened 1 year ago

raphaelmourad commented 1 year ago

Hi,

when I train Basenji with 2 genomes (with 2 heads in the model json), it works Epoch 0 - 20s Data 0 - train_loss: 0.1845 - train_r: 0.3442 - train_r: 0.1183 - valid_loss: 0.6504 - valid_r: 0.5966 - valid_r2: 0.2636 - best! Data 1 - train_loss: 0.1031 - train_r: 0.3400 - train_r: 0.1156 - valid_loss: -0.1034 - valid_r: 0.5949 - valid_r2: 0.0128 - best!

But when I train it with 4 genomes (with 4 heads in the model json) for instance, I get problems during training: Epoch 0 - 32s Data 0 - train_loss: 0.0585 - train_r: 0.4051 - train_r: 0.1635 - valid_loss: -0.9693 - valid_r: 0.6455 - valid_r2: 0.2770 - best! Data 1 - train_loss: -0.2097 - train_r: 0.4602 - train_r: 0.2118 - valid_loss: -1.1436 - valid_r: 0.6729 - valid_r2: 0.3266 - best! Data 2 - train_loss: 0.0000 - train_r: nan - train_r: nan - valid_loss: 0.0000 - valid_r: nan - valid_r2: nan Data 3 - train_loss: 0.0000 - train_r: nan - train_r: nan - valid_loss: 0.0000 - valid_r: nan - valid_r2: nan

Do you what could be the problem?

Thanks Raf

davek44 commented 1 year ago

The current code has a maximum of two genomes. You'll have to implement a new fit method in the Trainer class to do more.