[BUG] Error running example code #1903

Closed Sanzo00 closed 2 years ago

Sanzo00 commented 2 years ago

Describe the bug

[error] Check failed: IOError: Receive message failed: Connection reset by peer in "client->Connect(vineyard_socket)", in function void gs::EnsureClient(std::shared_ptr<vineyard::Client>&, const string&), file /work/analytical_engine/core/, line 123
terminate called after throwing an instance of 'std::runtime_error'
  what():  Check failed: IOError: Receive message failed: Connection reset by peer in "client->Connect(vineyard_socket)", in function void gs::EnsureClient(std::shared_ptr<vineyard::Client>&, const string&), file /work/analytical_engine/core/, line 123
*** Aborted at 1659058982 (unix time) try "date -d @1659058982" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGABRT (@0x3f00001ede5) received by PID 126437 (TID 0x7f813dbc1040) from PID 126437; stack trace: ***
    @     0x7f813fd223c0 (unknown)
    @     0x7f813ece818b gsignal
    @     0x7f813ecc7859 abort
    @     0x7f813f09c911 (unknown)
    @     0x7f813f0a838c (unknown)
    @     0x7f813f0a83f7 std::terminate()
    @     0x7f813f0a86a9 __cxa_throw
    @     0x7f8149263000 (unknown)
    @           0x48ebc0 gs::GrapeInstance::Init()
    @           0x46f353 gs::GrapeEngine::Start()
    @           0x45da42 main
    @     0x7f813ecc90b3 __libc_start_main
    @           0x45e345 (unknown)
    @                0x0 (unknown)
Traceback (most recent call last):
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/", line 87, in _run_code
    exec(code, run_globals)
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/", line 3, in <module>
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/", line 1779, in launch_graphscope
    coordinator_service_servicer = CoordinatorServiceServicer(
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/", line 175, in __init__
    if not self._launcher.start():
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/", line 174, in start
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/", line 610, in _create_services
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/", line 592, in _start_analytical_engine
Traceback (most recent call last):
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/", line 72, in waiting_service_ready
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/grpc/", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/grpc/", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses"
        debug_error_string = "{"created":"@1659059576.256340172","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/","file_line":3260,"referenced_errors":[{"created":"@1659059576.256338821","description":"failed to connect to all addresses","file":"src/core/lib/transport/","file_line":167,"grpc_status":14}]}"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "", line 11, in <module>
    graph = load_cora()
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/dataset/", line 80, in load_cora
    sess = get_default_session()
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/", line 1477, in get_default_session
    return _default_session_stack.get_default()
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/", line 1498, in get_default
    sess = session(cluster_type="hosts", num_workers=1)
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/", line 357, in wrapper
    return_value = func(*args, **kwargs)
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/", line 724, in __init__
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/", line 1065, in _connect
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/", line 82, in waiting_service_ready
    raise ConnectionError(f"Connect coordinator timeout, {msg}")
ConnectionError: Connect coordinator timeout, code: UNAVAILABLE, details: failed to connect to all addresses

To Reproduce

import graphscope
from graphscope.dataset import load_ogbn_mag

g = load_ogbn_mag()

Environment (please complete the following information):

sighingnow commented 2 years ago

Hi @Sanzo00,

Thanks for reporting. Could you please paste your pip3 list output here?

Sanzo00 commented 2 years ago

Hi @sighingnow , here this my pip3 list output:

Package                      Version
---------------------------- ----------------------
Package                Version
---------------------- ---------
aenum                  3.1.11
aiobotocore            2.3.4
aiohttp                3.8.1
aioitertools           0.10.0
aiosignal              1.2.0
aliyun-python-sdk-core 2.13.36
aliyun-python-sdk-kms  2.15.0
argcomplete            2.0.0
async-timeout          4.0.2
asynctest              0.13.0
attrs                  22.1.0
botocore               1.24.21
cachetools             5.2.0
certifi                2022.6.15
cffi                   1.15.1
charset-normalizer     2.1.0
cmake                  3.22.5
crcmod                 1.7
cryptography           37.0.4
cycler                 0.11.0
Cython                 3.0a6
etcd-distro            3.5.1
fonttools              4.34.4
frozenlist             1.3.0
fsspec                 2022.7.1
future                 0.18.2
google-auth            2.9.1
graphscope             0.15.0
graphscope-client      0.15.0
gremlinpython          3.6.1
grpcio                 1.48.0
grpcio-tools           1.48.0
gs-apps                0.15.0
gs-coordinator         0.15.0
gs-engine              0.15.0
gs-include             0.15.0
hdfs3                  0.3.1
idna                   3.3
importlib-metadata     4.12.0
isodate                0.6.1
jmespath               0.10.0
kiwisolver             1.4.4
kubernetes             24.2.0
matplotlib             3.5.2
msgpack                1.0.4
multidict              6.0.2
nest-asyncio           1.5.5
networkx               2.6
numpy                  1.21.6
oauthlib               3.2.0
orjson                 3.7.8
oss2                   2.16.0
packaging              21.3
pandas                 1.3.5
pickle5                0.0.12
Pillow                 9.2.0
pip                    22.1.2
protobuf               3.18.1
psutil                 5.9.1
pyarrow                6.0.0
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pycparser              2.21
pycryptodome           3.15.0
pyparsing              3.0.9
pysimdjson             5.0.1
python-dateutil        2.8.2
pytz                   2022.1
PyYAML                 6.0
requests               2.28.1
requests-oauthlib      1.3.1
rsa                    4.9
s3fs                   2022.7.1
scipy                  1.7.3
setuptools             61.2.0
shared-memory38        0.1.2
six                    1.16.0
sortedcontainers       2.4.0
tqdm                   4.64.0
treelib                1.6.1
typing_extensions      4.3.0
urllib3                1.26.11
vineyard               0.6.2
vineyard-io            0.6.2
websocket-client       1.3.3
wheel                  0.37.1
wrapt                  1.14.1
yarl                   1.7.2
zipp                   3.8.1
sighingnow commented 2 years ago

Cannot reproduce.

Could you please try python3 -m vineyard --socket=/tmp/vineyard.sock to see if vineyardd could be launched as expected?


Sanzo00 commented 2 years ago

I tried as you said and this is the output:

(graphscope) ➜  ~ python3 -m vineyard --socket=/tmp/vineyard.sock
I20220729 12:32:34.738972 131355] Hello vineyard v0.6.2!
I20220729 12:32:34.739400 131355 meta_service.h:94] meta service is starting ...
I20220729 12:32:36.463382 131355] Starting the etcd server
I20220729 12:32:36.463488 131355] Found etcd at: /home/sanzo/software/miniconda/4.12/envs/graphscope/bin/etcd
I20220729 12:32:36.466235 131355] Etcd launched: pid = 131433, listen on 2379
{"level":"info","ts":1659069156.5181363,"caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_LOG_LEVEL","variable-value":"error"}
[error] Check failed: Etcd error: Etcd has been launched but failed to connect to it in "root_vs->Serve(StoreType::kDefault)", in function vineyard::Status vineyard::VineyardRunner::Serve(), file /work/v6d/src/server/server/, line 63

Unhandled exception:
  std::exception:what(): Check failed: Etcd error: Etcd has been launched but failed to connect to it in "root_vs->Serve(StoreType::kDefault)", in function vineyard::Status vineyard::VineyardRunner::Serve(), file /work/v6d/src/server/server/, line 63
sighingnow commented 2 years ago

Looks quite strange. Could you please paste out of etcd, and python3 -m etcd_distro.etcd ?


Sanzo00 commented 2 years ago

etcd --version:

(graphscope) ➜  ~ etcd --version
etcd Version: 3.5.1
Git SHA: e8732fb5f
Go Version: go1.16.3
Go OS/Arch: linux/amd64

python3 -m etcd_distro.etcd:

(graphscope) ➜  ~ python3 -m etcd_distro.etcd
{"level":"info","ts":"2022-07-29T14:32:42.698+0800","caller":"etcdmain/etcd.go:72","msg":"Running: ","args":["/home/sanzo/software/miniconda/4.12/envs/graphscope/lib/python3.7/site-packages/etcd_distro/etcdbin/etcd"]}
{"level":"warn","ts":"2022-07-29T14:32:42.699+0800","caller":"etcdmain/etcd.go:104","msg":"'data-dir' was empty; using default","data-dir":"default.etcd"}
{"level":"info","ts":"2022-07-29T14:32:42.699+0800","caller":"etcdmain/etcd.go:115","msg":"server has been already initialized","data-dir":"default.etcd","dir-type":"member"}
{"level":"info","ts":"2022-07-29T14:32:42.699+0800","caller":"embed/etcd.go:131","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":"2022-07-29T14:32:42.699+0800","caller":"embed/etcd.go:367","msg":"closing etcd server","name":"default","data-dir":"default.etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]}
{"level":"info","ts":"2022-07-29T14:32:42.699+0800","caller":"embed/etcd.go:369","msg":"closed etcd server","name":"default","data-dir":"default.etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]}
{"level":"fatal","ts":"2022-07-29T14:32:42.699+0800","caller":"etcdmain/etcd.go:203","msg":"discovery failed","error":"listen tcp bind: address already in use","stacktrace":"\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/server/etcdmain/etcd.go:203\\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"}
sighingnow commented 2 years ago

It seems that your local 2380 port is in use but both vineyard and graphscope failed to detect that.

sighingnow commented 2 years ago

Hi @Sanzo00,

Could you drop a message to me via wechat or dingding ? I need more information about your environment settings as I cannot see what happens currently.

You could find me on wechat or dingding via 13240327026.


sighingnow commented 2 years ago

I think there's might be a program that listening on other network interfaces on the 2380 port so that our detection procedure failed.

Sanzo00 commented 2 years ago

Yes, there are other programs occupying this port. After I killed that program, it can be executed normally.

sighingnow commented 2 years ago

Happy to know that it works finally.

It is quite strange that we cannot detect the port is in use.