Murali-group / Beeline

BEELINE: evaluation of algorithms for gene regulatory network inference
GNU General Public License v3.0
171 stars 51 forks source link

Terminal/system halt when running "python BLRunner.py --config config-files/config.yaml" in the BEELINE conda environment #66

Closed shachafl closed 1 year ago

shachafl commented 2 years ago

Following the quick setup instructions in the main page, my terminal halts with the error message below (I have removed the prior ExpressionData0.csv output as it seems to finish properly).

I am using Anaconda for python 3.9 on Ubuntu 16.04

Any advice is welcome.


... docker run --rm -v /home/lshacha1/Downloads/BEELINE/Beeline:/VBEM/data/ grnbeeline/grnvbem:base /bin/sh -c "time -v -o data/outputs/example/GSD/GRNVBEM/time1.txt ./GRNVBEM data/inputs/example/GSD/GRNVBEM/ExpressionData1.csv data/outputs/example/GSD/GRNVBEM/outFile1.txt "

= Running AR1MA1-VBEM method for GRN inference =

( use [Ctrl]+[C] to abort the execution )

Choosing dataset...

file =

'data/inputs/example/GSD/GRNVBEM/ExpressionData1.csv'

Elapsed time is 0.274912 seconds.

docker run --rm -v /home/lshacha1/Downloads/BEELINE/Beeline:/data/ --expose=41269 grnbeeline/arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/example/GSD/GENIE3/ExpressionData.csv --outFile=data/outputs/example/GSD/GENIE3/outFile.txt " Task exception was never retrieved future: <Task finished coro=<connect..() done, defined at /opt/conda/lib/python3.7/site-packages/distributed/comm/core.py:288> exception=CommClosedError()> Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 297, in handshake = await asyncio.wait_for(comm.read(), 1) File "/opt/conda/lib/python3.7/asyncio/tasks.py", line 435, in wait_for await waiter concurrent.futures._base.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 304, in _ raise CommClosedError() from e distributed.comm.core.CommClosedError Traceback (most recent call last): File "runArboreto.py", line 43, in main(sys.argv) File "runArboreto.py", line 32, in main network = genie3(inDF.to_numpy(), client_or_address = client, gene_names = inDF.columns) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 73, in genie3 limit=limit, seed=seed, verbose=verbose) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 135, in diy .compute(graph, sync=True) \ File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2919, in compute result = self.gather(futures) File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 1993, in gather asynchronous=asynchronous, File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 834, in sync self.loop, func, *args, callback_timeout=callback_timeout, **kwargs File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 339, in sync raise exc.with_traceback(tb) File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 323, in f result[0] = yield future File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() concurrent.futures._base.CancelledError tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7fb5cd650bd0>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py:320> exception=OSError("Timed out trying to connect to 'inproc://172.17.0.2/9/1' after 10 s: Timed out trying to connect to 'inproc://172.17.0.2/9/1' after 10 s: connect() didn't finish in time")>) Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 322, in connect _raise(error) File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise raise IOError(msg) OSError: Timed out trying to connect to 'inproc://172.17.0.2/9/1' after 10 s: connect() didn't finish in time

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback ret = callback() File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result future.result() File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 401, in _close await self._correct_state() File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 328, in _correct_state_internal await self.scheduler_comm.retire_workers(workers=list(to_close)) File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 810, in send_recv_from_rpc comm = await self.live_comm() File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 772, in live_comm **self.connection_args, File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 334, in connect _raise(error) File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise raise IOError(msg) OSError: Timed out trying to connect to 'inproc://172.17.0.2/9/1' after 10 s: Timed out trying to connect to 'inproc://172.17.0.2/9/1' after 10 s: connect() didn't finish in time

shachafl commented 2 years ago

I have also tried to build the containers from scratch using: . initialize.sh But this also resulted in a terminal halt and errors (below):

docker run --rm -v /home/lshacha1/Downloads/BEELINE/Beeline:/data/ --expose=41269 grnbeeline/arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/example/GSD/GENIE3/ExpressionData.csv --outFile=data/outputs/example/GSD/GENIE3/outFile.txt " distributed.comm.inproc - WARNING - Closing dangling queue in Traceback (most recent call last): File "runArboreto.py", line 43, in main(sys.argv) File "runArboreto.py", line 32, in main network = genie3(inDF.to_numpy(), client_or_address = client, gene_names = inDF.columns) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 73, in genie3 limit=limit, seed=seed, verbose=verbose) File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 135, in diy .compute(graph, sync=True) \ File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2919, in compute result = self.gather(futures) File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 1993, in gather asynchronous=asynchronous, File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 834, in sync self.loop, func, *args, callback_timeout=callback_timeout, **kwargs File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 339, in sync raise exc.with_traceback(tb) File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 323, in f result[0] = yield future File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run value = future.result() concurrent.futures._base.CancelledError tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f3f9baaf750>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py:320> exception=OSError("Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: connect() didn't finish in time")>) Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 322, in connect _raise(error) File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise raise IOError(msg) OSError: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: connect() didn't finish in time

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback ret = callback() File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result future.result() File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 401, in _close await self._correct_state() File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 328, in _correct_state_internal await self.scheduler_comm.retire_workers(workers=list(to_close)) File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 810, in send_recv_from_rpc comm = await self.live_comm() File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 772, in live_comm **self.connection_args, File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 334, in connect _raise(error) File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise raise IOError(msg) OSError: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: connect() didn't finish in time

shachafl commented 2 years ago

Thanks dpeng28. Disabling Genie3 in the config.yaml helps to run: python BLRunner.py --config config-files/config.yaml

I have built the ARBORETO docker from scratch but it doesn't solve the problem with Genie3.

Moving forward, the next command to calculate AUC: python BLEvaluator.py --config config-files/config.yaml --auc

also returns error: Traceback (most recent call last): File "/home/lshacha1/Downloads/BEELINE/Beeline/BLEvaluator.py", line 162, in main() File "/home/lshacha1/Downloads/BEELINE/Beeline/BLEvaluator.py", line 85, in main evalConfig = ev.ConfigParser.parse(conf) File "/home/lshacha1/Downloads/BEELINE/Beeline/BLEval/init.py", line 335, in parse config_map = yaml.load(config_file_handle) TypeError: load() missing 1 required positional argument: 'Loader'

I have tried to add Loader=None to "config_map" but get a different error: Traceback (most recent call last): File "/home/lshacha1/Downloads/BEELINE/Beeline/BLEvaluator.py", line 162, in main() File "/home/lshacha1/Downloads/BEELINE/Beeline/BLEvaluator.py", line 85, in main evalConfig = ev.ConfigParser.parse(conf) File "/home/lshacha1/Downloads/BEELINE/Beeline/BLEval/init.py", line 335, in parse config_map = yaml.load(config_file_handle, Loader=None) ### Lior: added Loader=None File "/home/local/WIN/lshacha1/anaconda3/lib/python3.9/site-packages/yaml/init.py", line 79, in load loader = Loader(stream) TypeError: 'NoneType' object is not callable

On Sat, Jul 9, 2022 at 10:18 AM dpeng28 @.***> wrote:

Not sure if you resolved this issue or not, but I ran into the similar problem. It seems like the problem in your case is that GENIE3. You can double check if GENIE3 is indeed the problem by making should_run: [False] in the config.yaml. If GENIE3 is indeed the only problem, then I would rerun

`BASEDIR=$(pwd) You may remove the -q flag if you want to see the docker build status

cd $BASEDIR/Algorithms/ARBORETO docker build -q -t arboreto:base . echo "Docker container for ARBORETO is built and tagged as arboreto:base"`

from the initialize.sh script to rebuild arboreto.

— Reply to this email directly, view it on GitHub https://github.com/Murali-group/Beeline/issues/66#issuecomment-1179552278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP2CVYWVWT3SCJZZFE4J563VTGC25ANCNFSM5VSWQZOA . You are receiving this because you authored the thread.Message ID: @.***>

shachafl commented 2 years ago

The problem with python BLEvaluator.py --config config-files/config.yaml --auc was that my PyYaml was version 6.0 and required the extra "Loader" parameter under yaml.load(), so to fix the code I modified the file BLEval/init.py: config_map = yaml.load(config_file_handle, Loader=yaml.CLoader)

tmmurali commented 2 years ago

Thanks for this report and the fix. @ktakers can we update BLEvaluator.py with this change without breaking compatability with earlier versions of PyYAML?

shachafl commented 2 years ago

You can also raise the backward compatibility issue with the PyYaml team, and they can solve it by rolling back the change or adding defaults.

By using PyYAML==5.4 (instead of 6.0) as you defined in the requirements.txt and BEELINE conda environments the command: python BLEvaluator.py --config config-files/config.yaml --auc works fine.

But I am keeping the issue open for now as keeping Genie3 with the other algorithms still halts my terminal and return errors.

smartpig-666 commented 1 year ago

I encountered the same problem. When I added the genie3 algorithm, I also reported the same error:

docker run --rm -v /home/huxin/Beeline:/data/ --expose=41269 grnbeeline/arboreto:base /bin/sh -c "time -v -o data/outputs/example/Simulation/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/example/Simulation/GENIE3/ExpressionData.csv --outFile=data/outputs/example/Simulation/GENIE3/outFile.txt "
distributed.comm.inproc - WARNING - Closing dangling queue in <InProc  local=inproc://192.188.0.2/9/1 remote=inproc://192.188.0.2/9/8>
Traceback (most recent call last):
  File "runArboreto.py", line 43, in <module>
    main(sys.argv)
  File "runArboreto.py", line 32, in main
    network = genie3(inDF.to_numpy(), client_or_address = client, gene_names = inDF.columns)
  File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 73, in genie3
    limit=limit, seed=seed, verbose=verbose)
  File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 135, in diy
    .compute(graph, sync=True) \
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2919, in compute
    result = self.gather(futures)
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 1993, in gather
    asynchronous=asynchronous,
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 834, in sync
    self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
  File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 339, in sync
    raise exc.with_traceback(tb)
  File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 323, in f
    result[0] = yield future
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
concurrent.futures._base.CancelledError
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7fbacfd61890>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py:320> exception=OSError("Timed out trying to connect to 'inproc://192.188.0.2/9/1' after 10 s: Timed out trying to connect to 'inproc://192.188.0.2/9/1' after 10 s: connect() didn't finish in time")>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 322, in connect
    _raise(error)
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'inproc://192.188.0.2/9/1' after 10 s: connect() didn't finish in time

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 401, in _close
    await self._correct_state()
  File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 328, in _correct_state_internal
    await self.scheduler_comm.retire_workers(workers=list(to_close))
  File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 810, in send_recv_from_rpc
    comm = await self.live_comm()
  File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 772, in live_comm
    **self.connection_args,
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 334, in connect
    _raise(error)
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'inproc://192.188.0.2/9/1' after 10 s: Timed out trying to connect to 'inproc://192.188.0.2/9/1' after 10 s: connect() didn't finish in time

In addition, I tried to modify the Docker configuration file, but it didn't work

tmmurali commented 1 year ago

Thank you for this report. @ktakers can you take a look at this issue?

ktakers commented 1 year ago

I apologize for the late response.

The tornado timeout appears to be the same issue reported in https://github.com/Murali-group/Beeline/issues/48 and https://github.com/Murali-group/Beeline/issues/42. According to the Arboreto issue https://github.com/aertslab/arboreto/issues/10 , GENIE3 can run successfully despite those timeout errors.

Unfortunately I wasn't able to reproduce that error. Can you please check under the directory outputs/example/GSD/GENIE3 to see if there's a rankedEdges.csv or an outFile.txt, which would indicate that GENIE3 did actually complete successfully?

smartpig-666 commented 1 year ago

You are so kind. Unfortunately, I have been waiting for a long time for genie3 to complete normally , but it cannot generate rankededge.csv file. Now I have reproduced the genie3 algorithm and generated the rankededge separately. Beeline can normally score it

ktakers commented 1 year ago

Thank you for reporting and working around the issue.