Current main, commit ee2f30b42a43e854dfd3b5e75e265ef3f6dfcb52
On which installation method(s) does this occur?
Source
Describe the issue
Two issues with the ARCO data source:
When using ARCO in a Jupyter notebook, it complains about the event loop already running. Maybe check for an existing event loop and use that instead of calling asyncio.run. Example (in a notebook):
from datetime import datetime
from earth2studio.data import ARCO
File /workspace/repos/earth2studio/earth2studio/data/arco.py:122, in ARCO.call(self, time, variable)
119 # Make sure input time is valid
120 self._validate_time(time)
--> 122 xr_array = asyncio.run(
123 asyncio.wait_for(self.create_data_array(time, variable), self.async_timeout)
124 )
126 # Delete cache if needed
127 if not self._cache:
File /usr/lib/python3.10/asyncio/runners.py:33, in run(main, debug)
9 """Execute the coroutine and return the result.
10
11 This function runs the passed coroutine, taking care of
(...)
30 asyncio.run(main())
31 """
32 if events._get_running_loop() is not None:
---> 33 raise RuntimeError(
34 "asyncio.run() cannot be called from a running event loop")
36 if not coroutines.iscoroutine(main):
37 raise ValueError("a coroutine was expected, got {!r}".format(main))
RuntimeError: asyncio.run() cannot be called from a running event loop
2. When `cache=False`, ARCO complains about DistributedManager not being initialized. Same example as above just with `cache=False` gives the following:
File /workspace/repos/earth2studio/earth2studio/data/arco.py:117, in ARCO.call(self, time, variable)
115 time, variable = prep_data_inputs(time, variable)
116 # Create cache dir if doesnt exist
--> 117 pathlib.Path(self.cache).mkdir(parents=True, exist_ok=True)
119 # Make sure input time is valid
120 self._validate_time(time)
File /workspace/repos/earth2studio/earth2studio/data/arco.py:238, in ARCO.cache(self)
235 cache_location = os.path.join(datasource_cache_root(), "arco")
236 if not self._cache:
237 cache_location = os.path.join(
--> 238 cachelocation, f"tmp{DistributedManager().rank}"
239 )
240 return cache_location
File /usr/local/lib/python3.10/dist-packages/modulus/distributed/manager.py:121, in DistributedManager.init(self)
119 def init(self):
120 if not self._is_initialized:
--> 121 raise ModulusUninitializedDistributedManagerWarning()
122 super().init()
ModulusUninitializedDistributedManagerWarning: A DistributedManager object is being instantiated before this singleton class has been initialized. Instantiating a manager before initialization can lead to unexpected results where processes fail to communicate. Initialize the distributed manager via DistributedManager.initialize() before instantiating.
Version
Current main, commit ee2f30b42a43e854dfd3b5e75e265ef3f6dfcb52
On which installation method(s) does this occur?
Source
Describe the issue
Two issues with the ARCO data source:
asyncio.run
. Example (in a notebook):arco = ARCO(cache=True, verbose=False) ds = arco(datetime.fromisoformat("1980-01-01"), ["u10m"])
File /workspace/repos/earth2studio/earth2studio/data/arco.py:122, in ARCO.call(self, time, variable) 119 # Make sure input time is valid 120 self._validate_time(time) --> 122 xr_array = asyncio.run( 123 asyncio.wait_for(self.create_data_array(time, variable), self.async_timeout) 124 ) 126 # Delete cache if needed 127 if not self._cache:
File /usr/lib/python3.10/asyncio/runners.py:33, in run(main, debug) 9 """Execute the coroutine and return the result. 10 11 This function runs the passed coroutine, taking care of (...) 30 asyncio.run(main()) 31 """ 32 if events._get_running_loop() is not None: ---> 33 raise RuntimeError( 34 "asyncio.run() cannot be called from a running event loop") 36 if not coroutines.iscoroutine(main): 37 raise ValueError("a coroutine was expected, got {!r}".format(main))
RuntimeError: asyncio.run() cannot be called from a running event loop
File /workspace/repos/earth2studio/earth2studio/data/arco.py:117, in ARCO.call(self, time, variable) 115 time, variable = prep_data_inputs(time, variable) 116 # Create cache dir if doesnt exist --> 117 pathlib.Path(self.cache).mkdir(parents=True, exist_ok=True) 119 # Make sure input time is valid 120 self._validate_time(time)
File /workspace/repos/earth2studio/earth2studio/data/arco.py:238, in ARCO.cache(self) 235 cache_location = os.path.join(datasource_cache_root(), "arco") 236 if not self._cache: 237 cache_location = os.path.join( --> 238 cachelocation, f"tmp{DistributedManager().rank}" 239 ) 240 return cache_location
File /usr/local/lib/python3.10/dist-packages/modulus/distributed/manager.py:121, in DistributedManager.init(self) 119 def init(self): 120 if not self._is_initialized: --> 121 raise ModulusUninitializedDistributedManagerWarning() 122 super().init()
ModulusUninitializedDistributedManagerWarning: A DistributedManager object is being instantiated before this singleton class has been initialized. Instantiating a manager before initialization can lead to unexpected results where processes fail to communicate. Initialize the distributed manager via DistributedManager.initialize() before instantiating.