calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
409 stars 125 forks source link

Tutorials not working, input/parameter files? #1

Closed vagarwal87 closed 5 years ago

vagarwal87 commented 7 years ago

Hi there,

Thanks for releasing your interesting bioRxiv paper and code. I'd love to try out your model on the CAGE data you presented in the paper. The tutorial links seem broken, however, and I can't access them, and the one available in /bin/tutorial seems to be an older one for Basset. I also cannot find the "params_file" that you used for training. It would be greatly helpful if you could upload these and perhaps instructions for how to setup an example CAGE file for training? If you have the bandwidth, your pretrained files would also be very helpful for me to test.

Thanks again and best wishes, Vikram

davek44 commented 7 years ago

Hi Vikram,

Thanks for your interest, and sorry for the incomplete status here. The tutorials, data, and pre-trained models are a work in progress that I'll be fleshing out throughout the summer. I'll leave comments here as I make progress.

Best, Dave

JeevG commented 7 years ago

Hi Dave,

Any advance on uploading a params_file we can start to play with? The paper doesn't mention how many filters etc were used

@vagarwal87 - if you use the tensor2tensor release they have data relating to this paper (although their implementation is currently broken)

Thanks!

Jeev

dylanmmarshall commented 7 years ago

Hi Dave,

Adding my voice to the chorus here for people wanting an updated tutorial. Also, I'm wondering if it wouldn't be too much trouble to provide data used to make figures / conclusions in biorxiv paper in easily accessible form such as Google Drive link? Or perhaps whatever form a quickstart to basenji may be. Looking forward to further developments - awesome work!

Best, Dylan

davek44 commented 7 years ago

Hey all,

I've made some decent progress on these. And I will release models soon. The biggest hold up at this point is that we're working to integrate these architectures better with the TensorFlow input data format TFRecords and the new Dataset API. You can expect that stuff soon.

Best, Dave

ankitvgupta commented 7 years ago

Hey Dave, could you post the umap_macro.bed file that you refer to in https://github.com/calico/basenji/blob/master/tutorials/preprocess.ipynb? Or post a link to where it could be downloaded from?

davek44 commented 7 years ago

Sorry, that was a typo. It's referring to this file: https://github.com/calico/basenji/blob/master/tutorials/data/unmap_macro.bed

ankitvgupta commented 7 years ago

Ah got it - I should have looking more closely for that. Thanks!

goldmich commented 6 years ago

Hi Dave,

A quick question related to the discussion above. I'm trying to work through the gene expression tutorial (and do some further work with pre-trained models) and am having trouble loading the pre-trained models from the .tf files in basenji_test_genes.py -o data/gencode_chr9_test --rc -s --table models/params_small.txt models/gm12878_best.tf data/gencode_chr9_l262k_w128.h5. Even after replacing _best with _d10 and downloading the .tf.data-00000-of-00001, .tf.meta, and .tf.index associated with the gm12878_d10 model, I receive the following error: tensorflow.python.framework.errors_impl.NotFoundError: Key cnn5/batch_normalization/renorm_mean_weight not found in checkpoint. I assume that this is because I am missing a checkpoint file, which I cannot manage to find on this page. Any pointers? Thanks in advance.

Best, Michael

davek44 commented 6 years ago

HI Michael,

Sorry about that--I have a bit of an incompatibility right now between the master branch, and the pre-trained models. I'll clean that up next week. In the meantime, the safest bet is to work off the release branch here: https://github.com/calico/basenji/releases/tag/0.2

Best, Dave

goldmich commented 6 years ago

Hi Dave,

Thanks for working on this and pointing me to the release branch. I've setup the 0.2 release, but I am unfortunately still running into some issues with missing information from the checkpoint. Similar to the above error, the gene expression prediction from the tutorial fails with NotFoundError (see above for traceback): Key cnn0/batch_normalization/beta/Adam not found in checkpoint. Any tips? Thanks in advance, and please let me know if you need any more traceback.

Best, Michael

davek44 commented 6 years ago

Hi Michael,

You're right, the parameters file doesn't match the pre-trained model. I will clean it up this week. In the meantime, that tutorial can still demonstrate the steps involved. And if you'd like to see the output, you can train a model for a few epochs yourself with those parameters and ignore models/gm12878_d10.tf.

Best, Dave

hiraksarkar commented 6 years ago

Hi Dave, Awesome package, I am trying to create the files according to tutorial, but failed while generating data/heart_l262k.h5 It seems due to some update basenji currently does not have genome module

Traceback (most recent call last):
  File "/home/hirak/Projects/basenji/bin/basenji_hdf5_single.py", line 918, in <module>
    main()
  File "/home/hirak/Projects/basenji/bin/basenji_hdf5_single.py", line 205, in main
    chrom_segments = basenji.genome.load_chromosomes(fasta_file)
AttributeError: module 'basenji' has no attribute 'genome'
hiraksarkar commented 6 years ago

Hi Dave, I guess it was an import issue, figured it out. The tutorial should have been run within the root directory of basenji.

hiraksarkar commented 6 years ago

Hi Dave,

The package dependencies are resolved, but I am stuck with another problem in the basenji_hdf5_single.py script

Traceback (most recent call last):
  File "/home/hirak/Projects/basenji/bin/basenji_hdf5_single.py", line 924, in <module>
    main()
  File "/home/hirak/Projects/basenji/bin/basenji_hdf5_single.py", line 477, in main
    data=seqs_na[train_indexes],
UnboundLocalError: local variable 'seqs_na' referenced before assignment
davek44 commented 6 years ago

Hi all, the tutorials and both errors Hirak encountered should be fixed and working now.