esm-tools / esm_tools

Simple Infrastructure for Earth System Simulations
https://esm-tools.github.io/
GNU General Public License v2.0
25 stars 12 forks source link

Continuing run from release 4 with release 5 #268

Closed joakimkjellsson closed 3 years ago

joakimkjellsson commented 3 years ago

Good afternoon all

I'm trying to continue a run that I started last year using ESM-Tools release 4. Since then I've moved to release 5, but now struggling with continuing the run. If I'm reading the error messages correctly, the problem is with the linking of restart files. The old release 4 did not link, but made copies of restart files.

Here's my full output:

(base) blogin1:~/esm_tools/runscripts/oifs $ esm_runscripts oifs-43r3-tco199-amip-extended_blogin_annual.yaml -e OIFS-BJK008 -c 

================================================================================
STARTING SIMULATION JOB!
Experiment ID = OIFS-BJK008
Setup = oifs
Experiment is installed in:
       /scratch/usr/shkjocke/esm-slask//OIFS-BJK008
================================================================================

Traceback (most recent call last):
  File "/home/shkjocke/.local/bin/esm_runscripts", line 10, in <module>
    sys.exit(main())
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_runscripts/cli.py", line 187, in main
    Setup()
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_runscripts/sim_objects.py", line 50, in __call__
    self.compute(*args, **kwargs)
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_runscripts/sim_objects.py", line 84, in compute
    self.config = compute.run_job(self.config)
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_runscripts/compute.py", line 30, in run_job
    config = evaluate(config, "compute", "compute_recipe")
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_runscripts/helpers.py", line 69, in evaluate
    framework_recipe, framework_plugins, config
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_plugin_manager/esm_plugin_manager.py", line 130, in work_through_recipe
    config = getattr(submodule, workitem)(config)
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_runscripts/compute.py", line 181, in copy_files_to_thisrun
    config, config["general"]["in_filetypes"], source="init", target="thisrun"
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_runscripts/filelists.py", line 574, in copy_files
    file_source = resolve_symlinks(file_source)
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_runscripts/filelists.py", line 523, in resolve_symlinks
    return resolve_symlinks(points_to)
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_runscripts/filelists.py", line 523, in resolve_symlinks
    return resolve_symlinks(points_to)
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_runscripts/filelists.py", line 523, in resolve_symlinks
    return resolve_symlinks(points_to)
  [Previous line repeated 980 more times]
  File "/home/shkjocke/.local/lib/python3.7/site-packages/esm_runscripts/filelists.py", line 522, in resolve_symlinks
    points_to = os.path.realpath(file_source)
  File "/sw/tools/anaconda3/2019.10/skl/lib/python3.7/posixpath.py", line 395, in realpath
    path, ok = _joinrealpath(filename[:0], filename, {})
  File "/sw/tools/anaconda3/2019.10/skl/lib/python3.7/posixpath.py", line 443, in _joinrealpath
    path, ok = _joinrealpath(path, os.readlink(newpath), seen)
  File "/sw/tools/anaconda3/2019.10/skl/lib/python3.7/posixpath.py", line 410, in _joinrealpath
    if isabs(rest):
  File "/sw/tools/anaconda3/2019.10/skl/lib/python3.7/posixpath.py", line 67, in isabs
    sep = _get_sep(s)
  File "/sw/tools/anaconda3/2019.10/skl/lib/python3.7/posixpath.py", line 42, in _get_sep
    if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object

Does anyone know how to solve this problem? It seems to get stuck in an infinite loop of links where "resolve_symlinks" returns a link and not a real file. Has anyone had a similar problem before?

Cheers Joakim

denizural commented 3 years ago

Dear @joakimkjellsson, I found the error. This is actually not a bug in esm_tools but something to consider due to the infinite linking in UNIX.

First of all, it looks to me like you are not using the latest version of esm_tools by esm_versions upgrade

Consider this command: ln -s endless_link endless_link

This is a perfect UNIX command although it is useless. We just created a symbolic link (soft link) that points to itself. Dropping -s would not work here (hard link). When os.path.realpath tries to resolve the link, it will get into the infinite loop.

We just made a quick discussion with @mandresm and made a merge to the system. Therefore, when you upgrade your esm_tools you should be good to go.

Cheers, Deniz

joakimkjellsson commented 3 years ago

Hi @denizural Thanks for the help! I realised that my attempt to continue the run had created a bunch of broken links etc, so it took a while to fix. But I managed to get there.

However, I did have to comment out the two lines with "config" in your fix. Otherwise I got an error that config does not exist.

def resolve_symlinks(file_source):
    if os.path.islink(file_source):
        points_to = os.path.realpath(file_source)

        # deniz: check if file links to itself. In UNIX                                                                                                                                                                      
    # ln -s endless_link endless_link is a valid command                                                                                                                                                                 
        if os.path.abspath(file_source) == points_to:
            #if config["general"]["verbose"]:         ## commented out this                                                                                                                                                                      
            #    print(f"file {file_source} links to itself")                                                                                                                                                                
            return file_source

    # recursively find the file that the link is pointing to                                                                                                                                                             
        return resolve_symlinks(points_to)
    else:
        return(file_source)

Cheers Joakim

denizural commented 3 years ago

Hi @joakimkjellsson,

I am glad that it worked. config["general"]["verbose"] only works in the most recent version. But commenting out is of course a quick solution. Can you now run your model without any problems?

Cheers, Deniz