Open BatyrM opened 2 years ago
Hi, yes those are OK. They occur in cooltools, and the NaNs are handled lower in the script.
It it is a bit strange that you ended up with so many more sequences. Can you visualize the two BED files in a genome browser and observe how they're different?
1) In https://github.com/calico/basenji/blob/master/bin/akita_data_read.py#:~:text=seq_hic_obsexp%20%3D%20np.log(seq_hic_obsexp), in line 204 of akita_data_read.py while using https://github.com/calico/basenji/blob/master/manuscripts/akita/tutorial.ipynb notebook for data preprocessing of akita_data.py, I encountered "RuntimeWarning: divide by zero encountered in log" warning . Should I ignore this warning message, or this is something strange happening? I am using 5 Hi-C files provided in https://github.com/calico/basenji/blob/master/manuscripts/akita/data/targets.txt. I removed --sample argument as was recommended in the notebook. 2) After removing that --sample argument, sequences.bed file contains >19K lines or coordinates. Should this be like that because in the provided https://github.com/calico/basenji/blob/master/manuscripts/akita/data/sequences.bed file, number of lines are >7K? Or some sample argument was used to diminish that >19K to >7K?