calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
391 stars 120 forks source link

basenji_read overflow #196

Open ElArquitectorgo opened 3 months ago

ElArquitectorgo commented 3 months ago

Hi,

I am trying to do a study similar to the Enformer study for my final thesis, and to do so I have downloaded 4505 Encode tracks. When I ran the basenji_data script on I encountered the following error message numerous times

/mnt2/fscratch/users/ac_aux/vguirado/preprocess/bin/basenji_data_read.py:307: RuntimeWarning: overflow encountered in cast
  cov = self.cov_open.values(chrm, start, end, numpy=True).astype('float16')

The code:

#SBATCH --job-name=preprocess
#SBATCH --time=0-30:0
#SBATCH --mem=50G
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
##SBATCH --ntasks-per-node=1
#SBATCH --constraint=cal

...more sbatch things...

time python bin/basenji_data.py -s .9 -g data/hg38_gaps.bed -l 196608 --local -o data/human -p 128 -v .1 -w 128 data/genome.fa data/human_data.txt

I would like to know if this can affect something to the generation of the TFRecords, since during the training I am finding an extremely strange behavior as I show below:

porlacara

This is by recovering a checkpoint at epoch 80 and training 50 more until 130 (first graph), recovering the checkpoint from epoch 130 until 180 (second) and from 180 until 230 (right). Here I'm using a small subsample, but the same happens with the whole dataset (and worse loss).

Apparently my training code is fine, because I have tried retrieving the Enformer checkpoint that is public and modifying the output to train the same subset and there I do get results. That is, I keep the trunk part already trained and add a single linear layer on top.

But starting from 0, and also including 1019 tracks for the mouse, the model is not able to learn anything. The values of R^2 are 0 or negative no matter how many steps I train.

So it occurs to me that the problem is in the generation of the TFRecords, but the only warning I found was that.

Thank you for your time.

davek44 commented 3 months ago

Hi, it appears that the tracks you downloaded have values above the float16 max. You could change the code to use float32, or explicitly clip the values. All active development on this software is now here: https://github.com/calico/baskerville