Breakend / experiment-impact-tracker

MIT License
266 stars 31 forks source link

Error in "get_region_by_coords" on a remote computing cluster #57

Open nikhil153 opened 3 years ago

nikhil153 commented 3 years ago

Hi,

I am able to run the code smoothly on my local machine. The same code + env in a singularity container fails on a remote computing cluster with following error:

loading region bounding boxes for computing carbon emissions region, this may take a moment...
 454/454... rate=566.68 Hz, eta=0:00:00, total=0:00:00, wall=11:38 ESTT
Done!
INFO:Gathering system info for reproducibility...
ERROR:Status code Unknown from http://ipinfo.io/json: ERROR - HTTPConnectionPool(host='ipinfo.io', port=80): Max retries exceeded with url: /json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ad6ba6184c0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
  File "eval_with_tracker.py", line 565, in <module>
    tracker = ImpactTracker(log_dir)
  File "../../experiment-impact-tracker/experiment_impact_tracker/compute_tracker.py", line 246, in __init__
    self.initial_info = gather_initial_info(logdir)
  File "../../experiment-impact-tracker/experiment_impact_tracker/compute_tracker.py", line 225, in gather_initial_info
    data[key] = info_["routing"]["function"]()
  File "../../experiment-impact-tracker/experiment_impact_tracker/data_info_and_router.py", line 63, in <lambda>
    "routing": {"function": lambda: get_current_region_info_cached()[0]},
  File "../../experiment-impact-tracker/experiment_impact_tracker/emissions/get_region_metrics.py", line 65, in get_current_region_info_cached
    return get_current_region_info(ttl_hash=get_ttl_hash(seconds=60 * 60))
  File "../../experiment-impact-tracker/experiment_impact_tracker/emissions/get_region_metrics.py", line 43, in get_current_region_info
    return get_zone_information_by_coords(get_current_location())
  File "../../experiment-impact-tracker/experiment_impact_tracker/emissions/get_region_metrics.py", line 10, in get_zone_information_by_coords
    region = get_region_by_coords(coords)
  File "../../experiment-impact-tracker/experiment_impact_tracker/emissions/get_region_metrics.py", line 17, in get_region_by_coords
    point = Point(lon, lat)
  File "/usr/local/lib/python3.8/dist-packages/shapely/geometry/point.py", line 48, in __init__
    self._set_coords(*args)
  File "/usr/local/lib/python3.8/dist-packages/shapely/geometry/point.py", line 137, in _set_coords
    self._geom, self._ndim = geos_point_from_py(tuple(args))
  File "/usr/local/lib/python3.8/dist-packages/shapely/geometry/point.py", line 214, in geos_point_from_py
    dx = c_double(coords[0])
TypeError: must be real number, not NoneType

I am able to ping the ipinfo.io from the same node on the cluster.

ping ipinfo.io
PING ipinfo.io (216.239.34.21) 56(84) bytes of data.
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=1 ttl=111 time=0.655 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=2 ttl=111 time=0.809 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=3 ttl=111 time=0.836 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=4 ttl=111 time=0.733 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=5 ttl=111 time=0.797 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=6 ttl=111 time=0.741 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=7 ttl=111 time=0.762 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=8 ttl=111 time=0.744 ms
64 bytes from any-in-2215.1e100.net (216.239.34.21): icmp_seq=9 ttl=111 time=0.749 ms
^C
--- ipinfo.io ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8008ms
rtt min/avg/max/mdev = 0.655/0.758/0.836/0.055 ms

Any suggestions? Thanks!

Breakend commented 3 years ago

Hi @nikhil153, did you resolve this issue? If so, what was the solution so that we can ensure people do not run into it again?

nikhil153 commented 3 years ago

@Breakend so this issues is a bit mysterious. It only happens on a specific HPC cluster that could be blocking external IPs. Although the interactive sessions on the same cluster let me ping the Internet. So I am not exactly sure. I created a workaround for my situation by modifying the code to override geo-location. However this branch is still under development, so I didn't create a PR for this feature. If you think it will be useful in general, I will be happy to do so.

Breakend commented 3 years ago

HI @nikhil153 , sounds like this might be a common use case, so I'm going to go ahead and re-open. if you have the time, would be happy to get a PR from you, otherwise we'll add it to our backlog

nikhil153 commented 3 years ago

@Breakend sure, I will create one soon as I am actively working it.