churchlab / UniRep

UniRep model, usage, and examples.
338 stars 96 forks source link

evotuned weights #2

Closed spark157 closed 5 years ago

spark157 commented 5 years ago

Hello,

I would like to use the evotuned/unirep/ weights within the unirep_tutorial.ipynb example but am getting some errors. In particular I get the following error when setting up the babbler in the third calculation cell:

FileNotFoundError: [Errno 2] No such file or directory: './evotuned/unirep/embed_matrix:0.npy'

When I look in the evotuned/unirep folder where the weights should be I only see the following two files: model-13560.index model-13560.meta

The weight data doesn't seem to be set up as it is for the other model/weights. This is after having run: !aws s3 sync --no-sign-request --quiet s3://unirep-public/evotuned/unirep/ evotuned/unirep/

Can you please verify that the weights are being properly downloaded from the server using the above?

Thanks.

Scott

spark157 commented 5 years ago

Just a follow up with a few more details. When I remove --quiet I end up with the following:

aws s3 sync --no-sign-request s3://unirep-public/evotuned/unirep/ evotuned/unirep/ download failed: s3://unirep-public/evotuned/unirep/model-13560.data-00000-of-00001 to evotuned/unirep/model-13560.data-00000-of-00001 Could not connect to the endpoint URL: "https://unirep-public.s3.amazonaws.com/evotuned/unirep/model-13560.data-00000-of-00001"

Before the final error there is some information about 139.8 MiB/208.5 MiB being downloaded, then after pausing it says 88 MiB/208.5 MiB is downloaded and then terminated with the error above.

Oddly, the process takes out my internet connection and I'm only able to reconnect to the net by rebooting my machine (macOS High Sierra 10.13.1).

Anyway, I'm hoping there is just something misconfiged with what looks like one of the two download files.

Scott

spark157 commented 5 years ago

And more details: If I simply put the file request in a browser I can download the data file with no issues:

https://unirep-public.s3.amazonaws.com/evotuned/unirep/model-13560.data-00000-of-00001

gets me the data on my local machine but it doesn't seem to have a useable format.

For example when I download the 64 weights:

aws s3 sync --no-sign-request s3://unirep-public/64_weights/ 64_weights/

it downloads each of the *.npy files already to go.

At any rate, perhaps it is just the file that is on AWS that isn't correct.

Thanks.

Scott

spark157 commented 5 years ago

I tried to come at it another way by running the sample Jupyter notebook on Google Colab but I still get the same/similar thing.

When I run: !aws s3 sync --no-sign-request --quiet s3://unirep-public/evotuned/unirep/ evotuned/unirep/

I get the following files:

total 285048
-rw-r--r-- 1 root root 218632636 Jan 24  2019 model-13560.data-00000-of-00001
-rw-r--r-- 1 root root      1343 Jan 24  2019 model-13560.index
-rw-r--r-- 1 root root  73242200 Jan 24  2019 model-13560.meta

Subsequently when the babbler is set up I have the following error: FileNotFoundError: [Errno 2] No such file or directory: './evotuned/unirep/embed_matrix:0.npy'

It would seem the file on aws isn't correctly set up. Hope that helps narrow things down.

Scott

sandias42 commented 5 years ago

Hi Scott, Thanks for your issue and sorry for the wait. I don't know what's going on with the aws download (weird that it works from the browser but not command line) but given you have been able to download it I can just quickly tell you how to load the evotuned weights. The fastest way is to use tensorflow's model checkpoint interface rather than the babbler init. You basically will initialize with 1900 weights and then use tensorflow's saver interface to overwrite them with the evotuned model.

b = babbler(batch_size=batch_size, model_path=MODEL_WEIGHT_PATH) with tf.Session(config=con) as sess: saver = tf.train.Saver() saver.restore(sess, './evotuned/unirep/')

Where './evotuned/unirep/' is just the path to the directory containing model-13560.data-00000-of-00001 and the other files you downloaded from aws.

sandias42 commented 5 years ago

(With the correct python indentation obviously).

Closing the issue but feel free if you have trouble with this. Ethan

spark157 commented 5 years ago

Ok - am trying but only thing is config=con is giving some grief:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-f0ff79a61dd7> in <module>
      1 batch_size = 14
      2 b = babbler(batch_size=batch_size, model_path=MODEL_WEIGHT_PATH)
----> 3 with tf.Session(config=con) as sess:
      4     saver = tf.train.Saver()
      5     saver.restore(sess, './evotuned/unirep/')

NameError: name 'con' is not defined

How would I define con?

Thanks for your help.

Scott

sandias42 commented 4 years ago

con is a tf.ConfigProto (https://github.com/tensorflow/docs/blob/r1.3/site/en/api_docs/api_docs/python/tf/ConfigProto.md)

There is better documentation somewhere, wanted to make sure you had the correct version (1.3).

The config proto allows you to specify configuration of the run. its not essential for this, in all likelihood. Try deleting and running without.

Cheers, Ethan.

sandias42 commented 4 years ago

Also, please reopen the issue before commenting with continued challenges. I won't see it otherwise.

Thanks Ethan

spark157 commented 4 years ago

I finally had a chance to give this a go and what Ethan noted is correct with the small addition that you also need to add in the model name, in this case model-13560, to restore the proper checkpoint. So the code would look something like the following:

b = babbler(batch_size=batch_size, model_path=MODEL_WEIGHT_PATH)
with tf.Session() as sess:
    saver = tf.train.Saver()
    saver.restore(sess, './evotuned/unirep/model-13560')

Hope that helps others with a similar issue.

Scott