calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
391 stars 120 forks source link

Running tutorials error #72

Open AndyPJiang opened 3 years ago

AndyPJiang commented 3 years ago

Hi, I was following the tutorials for trying to run the sad.ipynb notebook in the tutorial folder. Installations all worked fine. However I get an error when running the following command:

! basenji_sad.py --cpu -f data/hg19.ml.fa -g data/human.hg19.genome --h5 -o output/rfx6_sad --rc --shift "1,0,-1" -t data/heart_wigs.txt models/params_small.txt models/heart/model_best.tf data/rs339331.vcf

The first error I get is zsh:1: command not found: basenji_sad.py, which I can solve by changing the directory to ../bin/basenji_sad.py.

Subsequently, no such option errors arise. After running ! ../bin/basenji_sad.py --help, and removing the flags that aren't shown, the code runs.

Finally, the command looks like ! ../bin/basenji_sad.py -f data/hg19.ml.fa -o output/rfx6_sad --rc --shift "1,0,-1" -t data/heart_wigs.txt models/params_small.json models/heart/model_best.tf data/rs339331.vcf

I get an assertion error: Traceback (most recent call last): File "../bin/basenji_sad.py", line 426, in <module> main() File "../bin/basenji_sad.py", line 170, in main seqnn_model.restore(model_file) File "/Users/andyjiang/basenji/basenji/basenji/seqnn.py", line 350, in restore self.models[head_i].load_weights(model_file) File "/Users/andyjiang/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2216, in load_weights status.assert_nontrivial_match() File "/Users/andyjiang/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 1023, in assert_nontrivial_match return self.assert_consumed() File "/Users/andyjiang/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 998, in assert_consumed raise AssertionError( AssertionError: Some objects had attributes which were not restored:

We are focusing on predicting the effect of genetic variants, so we started with this script using the model in the tutorial script. Could this error be caused by differences in the version used to generate that model? We would prefer to use a pre-trained model to predict genetic effects if available.

Would appreciate any help.

Andy

davek44 commented 3 years ago

Hi Andy, that notebook isn't up to date for tensorflow2.*; sorry about that. I'll work on cleaning it up this week. In the meantime, you could try using the same command but replacing the parameters json and model file with those in this directory https://github.com/calico/basenji/tree/master/manuscripts/cross2020

rpique commented 3 years ago

Hi David. Andy got it to work at the end with the tf1 branch. Next, we would like to use predictions for lymphoblastoid cell-lines (LCLs) to compare chromatin QTLs and other annotations. We would prefer to avoid to train the model again, if possible, but in the tutorial model it seems that it is tailored to heart tissue. Do you have a model that is already trained for LCLs that we could use?

davek44 commented 3 years ago

Yup, the model that I pointed you to above has many LCL-relevant datasets. You can see the full list here: https://github.com/calico/basenji/blob/master/manuscripts/cross2020/targets_human.txt

davek44 commented 3 years ago

In case it's still beneficial to you, I just pushed an update to the tutorials.