calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
410 stars 126 forks source link

"RuntimeWarning: divide by zero encountered in log" in akita_data_read.py #118

Open BatyrM opened 2 years ago

BatyrM commented 2 years ago

1) In https://github.com/calico/basenji/blob/master/bin/akita_data_read.py#:~:text=seq_hic_obsexp%20%3D%20np.log(seq_hic_obsexp), in line 204 of akita_data_read.py while using https://github.com/calico/basenji/blob/master/manuscripts/akita/tutorial.ipynb notebook for data preprocessing of akita_data.py, I encountered "RuntimeWarning: divide by zero encountered in log" warning . Should I ignore this warning message, or this is something strange happening? I am using 5 Hi-C files provided in https://github.com/calico/basenji/blob/master/manuscripts/akita/data/targets.txt. I removed --sample argument as was recommended in the notebook. 2) After removing that --sample argument, sequences.bed file contains >19K lines or coordinates. Should this be like that because in the provided https://github.com/calico/basenji/blob/master/manuscripts/akita/data/sequences.bed file, number of lines are >7K? Or some sample argument was used to diminish that >19K to >7K?

davek44 commented 2 years ago

Hi, yes those are OK. They occur in cooltools, and the NaNs are handled lower in the script.

It it is a bit strange that you ended up with so many more sequences. Can you visualize the two BED files in a genome browser and observe how they're different?