UA-RCL / RANC

https://ua-rcl.github.io/projects/ranc
MIT License
39 stars 12 forks source link

Unable to run python scripts for file genertation #16

Open VangelisMak opened 1 year ago

VangelisMak commented 1 year ago

Hello, I was trying to run the python scripts you have in order to generate the input and config files but I was unable to do so. I tried running the software/tealayers/tealayers2.0/example.py script, as well as the _software/mnist/mnistsweep.py script. In both cases I get the following error:

ValueError: No gradients provided for any variable: (['tea_1_1/connections:0', 'tea_1_1/bias:0', 'tea_1_2/connections:0', 'tea_1_2/bias:0', 'tea_1_3/connections:0', 'tea_1_3/bias:0', 'tea_1_4/connections:0', 'tea_1_4/bias:0', 'tea_2/connections:0', 'tea_2/bias:0'],). Provided `grads_and_vars` is ((None, <tf.Variable 'tea_1_1/connections:0' shape=(256, 64) dtype=float32>), (None, <tf.Variable 'tea_1_1/bias:0' shape=(64,) dtype=float32>), (None, <tf.Variable 'tea_1_2/connections:0' shape=(256, 64) dtype=float32>), (None, <tf.Variable 'tea_1_2/bias:0' shape=(64,) dtype=float32>), (None, <tf.Variable 'tea_1_3/connections:0' shape=(256, 64) dtype=float32>), (None, <tf.Variable 'tea_1_3/bias:0' shape=(64,) dtype=float32>), (None, <tf.Variable 'tea_1_4/connections:0' shape=(256, 64) dtype=float32>), (None, <tf.Variable 'tea_1_4/bias:0' shape=(64,) dtype=float32>), (None, <tf.Variable 'tea_2/connections:0' shape=(256, 250) dtype=float32>), (None, <tf.Variable 'tea_2/bias:0' shape=(250,) dtype=float32>)).

The error shows up when the "fit" function is called in line 53 in software/tealayers/tealayer2.0/example.py (similarly in the other one)

I also followed the TeaLearning Tutorial provided to make sure I didn't do anything wrong but I was getting the same results. There was a syntax error in example.py but it was not big deal. The ValueError however is very persistant. Does the code still run normally in your setups? If so, do you have any insight as to why the error appears and how to fix it? Can you maybe do a test run of the scripts? Thanks

mackncheesiest commented 1 year ago

Hey, I'll do some investigating on our side, but my expectation is likely that the tensorflow version is an issue. From what I remember, tealayers2.0 required precisely tensorflow-gpu==2.0.0b1. tensorflow-gpu==2.0.0 was never made to work, let alone the latest tensorflow-gpu==2.12.0 (edit: apparently 2.13.0)

mackncheesiest commented 1 year ago

Yeah I understandably can't even install that version of tensorflow in an Ubuntu 18 container...

(venv) root@11990230690b:~# pip install tensorflow-gpu==2.0.0b1
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==2.0.0b1 (from versions: 0.12.1, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.2.1, 1.3.0, 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.12.2, 1.12.3, 1.13.1, 1.13.2, 1.14.0, 1.15.0, 1.15.2, 1.15.3, 1.15.4, 1.15.5, 2.0.0, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.5.0, 2.5.1, 2.5.2, 2.6.0, 2.6.1, 2.6.2, 2.12.0)
ERROR: No matching distribution found for tensorflow-gpu==2.0.0b1

I would assume that particular pip package we were fetching had been identical to this release, though (despite the slightly different name)...

If I can find some spare time, I might be able to figure out what was keeping at least 2.0.0 from working properly, but I'm not sure I can give a confident timeline atm, sorry.

micbar-21 commented 1 year ago

Hello @mackncheesiest, I ran into some trouble while trying to run some python notebooks or scripts, probably due to incompatibility of Tensorflow, Keras or Python versions too. Do you have perhaps a working container or virtual environement with the correct Python versions and relative packages? Installing Tensorflow from source is not so trivial too. Is there also a speciffic Python version I should use? Thanks

mackncheesiest commented 1 year ago

It's ultimately looking pretty tricky to get this running again. The best bet might be to leverage a singularity image (similar to docker images but they don't require root-access to run) that we built a few years ago when performing these experiments on a University-managed compute platform.

https://emailarizona-my.sharepoint.com/:u:/g/personal/akoglu_arizona_edu/EX72kvLQBwVDqtYicKY0M10BdF5oMiyqH4rdzNuUo89gdQ?e=R76jlG

Due to University OneDrive restrictions, this link expires Dec. 25, 2023 just FYI

micbar-21 commented 1 year ago

Thank you very much @mackncheesiest, this will be very helpful Best regards

HalfW-dev commented 6 months ago

Hello @mackncheesiest I'm trying to run the example.py in the tealayer2.0 folder and encountering the same problem. Could you kindly provide a link to the singularity image again? Thank you.

mackncheesiest commented 6 months ago

@HalfW-dev here's an updated link!

https://emailarizona-my.sharepoint.com/:u:/g/personal/akoglu_arizona_edu/EX72kvLQBwVDqtYicKY0M10BdF5oMiyqH4rdzNuUo89gdQ?e=QwWkxZ

Expires July 1st, 2024

HalfW-dev commented 6 months ago

Thank you @mackncheesiest! Best regards.