AlexWorldD / NetEmbs

Framework for Representation Learning on Financial Statement Networks
Apache License 2.0
1 stars 1 forks source link

Cached files #13

Open boersmamarcel opened 5 years ago

boersmamarcel commented 5 years ago

Hi Aleksei,

The cached files is awesome! However, if the directory doesn't exist yet it crashes:

Traceback (most recent call last):
  File "/Users/mboersma/Documents/phd/students/alex/NetEmbs-dev-12/b_experiments/experiment.py", line 21, in <module>
    embds = get_embs_TF(df, embed_size = 2, walks_per_node = 2, num_steps=200, use_cached_skip_grams= False)
  File "/Users/mboersma/Documents/phd/students/alex/NetEmbs-dev-12/NetEmbs/SkipGram/tensor_flow.py", line 233, in get_embs_TF
    pd.DataFrame(embs).to_pickle(WORK_FOLDER[0] + WORK_FOLDER[1] + "cache/snapshot.pkl")
  File "/Users/mboersma/PycharmProjects/networkembedding/venv/lib/python3.7/site-packages/pandas/core/generic.py", line 2593, in to_pickle
    protocol=protocol)
  File "/Users/mboersma/PycharmProjects/networkembedding/venv/lib/python3.7/site-packages/pandas/io/pickle.py", line 73, in to_pickle
    is_text=False)
  File "/Users/mboersma/PycharmProjects/networkembedding/venv/lib/python3.7/site-packages/pandas/io/common.py", line 430, in _get_handle
    f = open(path_or_buf, mode)
FileNotFoundError: [Errno 2] No such file or directory: '2_walks30_pressure30_window3/TFsteps200000batch64_emb32/cache/snapshot.pkl'

I added a couple of lines such that it creates the directory when it is not found, this seems to be working:

in utils.py I added

        skip_gr = tr.encode_pairs(get_pairs(N_JOBS, version, walk_length, walks_per_node, direction))
        if not os.path.exists(WORK_FOLDER[0]):
            os.makedirs(WORK_FOLDER[0])
        with open(WORK_FOLDER[0] + "skip_grams_cached.pkl", "wb") as file:
            pickle.dump(skip_gr, file)

os.makedirs

in tensor flow.py


    if not os.path.exists(WORK_FOLDER[0] + WORK_FOLDER[1] + 'cache/'):
        os.makedirs(WORK_FOLDER[0] + WORK_FOLDER[1] + "cache/")
    pd.DataFrame(embs).to_pickle(WORK_FOLDER[0] + WORK_FOLDER[1] + "cache/snapshot.pkl")

such that it creates a cache folder.

AlexWorldD commented 5 years ago

Hi, Marcel! Have you pulled the current version of code? Because I've tested it a few times and it was OK. Plus, in current version of MarcelExperiments.py file I have almost the same solution for folder creation as you did :) ...but, it also could be due to the lack of permissions for Python to write in Home directory...


Also, I've added this morning logging for TensorBoard, see

# What save for TensorBoard during model training: "full" includes min/max/mean/std for weights/biases, but very expensive
# LOG_LEVEL = "full"
# "cost" includes only the cost values
LOG_LEVEL = "cost"

Alex

boersmamarcel commented 5 years ago

Everything works fine!

Kind regards,

Marcel Boersma

On May 11, 2019, at 6:03 PM, Alex Malyutin notifications@github.com wrote:

Hi, Marcel! Have you pulled the current version of code? Because I've tested it a few times and it was OK. Plus, in current version of MarcelExperiments.py file I have almost the same solution for folder creation as you did :) ...but, it also could be due to the lack of permissions for Python to write in Home directory...

Also, I've added this morning logging for TensorBoard, see

What save for TensorBoard during model training: "full" includes min/max/mean/std for weights/biases, but very expensive

LOG_LEVEL = "full"

"cost" includes only the cost values

LOG_LEVEL = "cost" Alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

AlexWorldD commented 5 years ago

I really recommend making the first run of the code (after pulling from GitHub) with a small part of simulated data (or a small part of real data, it's up to you), because I know how it could be annoying to get an error after 30min of calculations... It happened yesterday with me - had forgotten to change a bit the path to cache folder... and got an error :(


Also, please, be sure that your config parameters are what you are really needed and not mine from GitHub :) I've tested different batch_size values and didn't see significant improvements in final results...

Alex