Open frostinassiky opened 3 years ago
BTW, does the learning curve look good for a production Akita training? Early stopping happens on the 35-th epoch.
The tutorial chooses two of the datasets arbitrarily to demonstrate the code. The primary model that we studied in the paper was trained on the five target datasets described here: https://github.com/calico/basenji/blob/master/manuscripts/akita/data/targets.txt
Yes, those training curves look good. I'm guessing your showing the training set statistics since I don't think early stopping would have chosen to stop if that were the validation set statistics.
Hi @davek44 Thanks for your response! Do you have direct links for the three datasets: GM12878, IMR90, and HCT116?
https://storage.googleapis.com/basenji_hic/tutorials/coolers/GM12878_inSitu_MboI_all.hg38.2048.cool
../../data
If you want all of the datasets, you should consider using the preprocessed dataset into TFRecords, which you can acquire with this script: https://github.com/calico/basenji/blob/master/manuscripts/akita/get_data.sh
If you want all of the cooler files, I added them to the cloud bucket here: https://console.cloud.google.com/storage/browser/basenji_hic/1m/data/coolers
Thanks for the amazing job!
According to the Akita Tutorial, we need to specify model parameters json to have only two targets.
ref: https://github.com/calico/basenji/blob/master/manuscripts/akita/tutorial.ipynb
However, the original parameters json have 5 targets. What are the extra 3 targets?