gnes-ai / gnes

GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.
https://gnes.ai
Other
1.26k stars 210 forks source link

Waiting on channel to be ready #323

Open jloveric opened 4 years ago

jloveric commented 4 years ago

I'm running the demo-poems-ir and it seems to be stuck at

I:MyClient:[bas:__i:124]:setting up grpc insecure channel...
I:MyClient:[bas:__i:133]:waiting channel to be ready...

Probably something with my setup. Anyone else seen this. Thanks

davidlenz commented 4 years ago

@jloveric In case you make clean, did you make index again before make client_index d=10? I forgot once and got stuck at waiting channel to be ready...


Having similar issues though:

I:MyClient:[bas:__i:128]:setting up grpc insecure channel...
I:MyClient:[bas:__i:137]:waiting channel to be ready...
I:MyClient:[bas:__i:141]:create new stub...
C:MyClient:[bas:__i:146]:gnes client ready at 0.0.0.0:5566!
index [=                   ]  elapsed: 0.0s   speed: 0.0 batch/s

and then it stops there forever.

Here's the traceback for KeyboardInterrupt:

Traceback (most recent call last):
  File "app.py", line 41, in <module>
    MyClient(parser.parse_args())
  File "/usr/local/lib/python3.7/site-packages/gnes/client/cli.py", line 33, in __init__
    self.start()
  File "/usr/local/lib/python3.7/site-packages/gnes/client/cli.py", line 51, in start
    getattr(self, self.args.mode)()
  File "/usr/local/lib/python3.7/site-packages/gnes/client/cli.py", line 68, in index
    batch_size=self.args.batch_size)):
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 388, in __next__
    return self._next()
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 373, in _next
    _common.wait(self._state.condition.wait, _response_ready)
  File "/usr/local/lib/python3.7/site-packages/grpc/_common.py", line 140, in wait
    _wait_once(wait_fn, MAXIMUM_WAIT_TIMEOUT, spin_cb)
  File "/usr/local/lib/python3.7/site-packages/grpc/_common.py", line 105, in _wait_once
    wait_fn(timeout=timeout)
  File "/usr/local/lib/python3.7/threading.py", line 300, in wait
    gotit = waiter.acquire(True, timeout)
KeyboardInterrupt

Am running on Do Docker Image and here's the vm setup:

sudo apt-get update
sudo apt-get -y upgrade
apt-get install make
docker swarm init --advertise-addr < VM IP ADDRESS >
git clone https://github.com/gnes-ai/demo-poems-ir.git
cd demo-poems-ir
make build
make index
make client_index d=10

EDIT: Fixed URL, minimized setup example

hanxiao commented 4 years ago

hmm, give me some time and let me check. Debugging a Docker env is always challenging.

In the meantime, I'd like to give you a sneak peak on the ongoing effort of GNES Flow, it provides a pythonic and intuitive interface for building workflow in GNES. You can get some examples from this unit test. Once a flow is built, one can export a flow to Docker Swarm, K8S or even a SVG image in a painless way.

This is not a mature feature yet, our plan is to remove GNES compose module and use GNES Flow as the main interface in tutorial and web UI.

If you have any thoughts on the API design of GNES flow, feel free to give some feedback.

hanxiao commented 4 years ago

@jloveric @davidlenz thanks again ❤️ for trying GNES and giving feedback at the early stage. I'd like to introduce you the new GNES Flow API (available since v0.0.46), enables a pythonic and intuitive way of building workflow in GNES. As an example, an indexing workflow can be simply defined as:

flow = (Flow(check_version=False, ctrl_with_ipc=True)
        .add_preprocessor(name='prep', yaml_path='yaml/prep.yml', replicas=3)
        .add_encoder(yaml_path='yaml/incep.yml', replicas=6)
        .add_indexer(name='vec_idx', yaml_path='yaml/vec.yml')
        .add_indexer(name='doc_idx', yaml_path='yaml/doc.yml', recv_from='prep')
        .add_router(name='sync', yaml_path='BaseReduceRouter', num_part=2, recv_from=['vec_idx', 'doc_idx']))

# then use it for indexing
with flow(backend='process') as fl:
    fl.index(bytes_gen=read_flowers(), batch_size=64)

🔰 You can find some resources here to help you getting started quickly:

davidlenz commented 4 years ago

@hanxiao i think this is an truly amazing project, so you and your team are the ones to be thanked. Keep up the great work.

I was able to successfully reproduce the flower example on a cloud machine using the following setup

sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get install -y python3-pip
sudo apt-get install -y build-essential libssl-dev libffi-dev python-dev

pip3 install tensorflow==1.12
python3 -m pip install jupyterlab
pip3 install gnes[all]
apt-get install libsndfile-dev -y

git clone https://github.com/gnes-ai/demo-gnes-flow.git
cd demo-gnes-flow
TEST_WORKDIR=/tmp/gnes-flow-demo/
mkdir ${TEST_WORKDIR}
curl http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz --output inception_v4_2016_09_09.tar.gz
tar -xvf inception_v4_2016_09_09.tar.gz
mv inception_v4.ckpt ${TEST_WORKDIR}
rm inception_v4_2016_09_09.tar.gz
curl http://www.robots.ox.ac.uk/~vgg/data/flowers/17/17flowers.tgz --output 17flowers.tgz
mv 17flowers.tgz ${TEST_WORKDIR}
jupyter lab --allow-root

note that i had to manually install libsndfile, as otherwise i would get an error when indexing flowers.

I wonder, how can i query the cloud machine from my local machine? I couldn't figure this out.

AlexanderKUA commented 4 years ago

Hello,

I have the same issue with demo-poems-ir. I'd like to reproduce this example because it's connected to my current task. I found following issue with router service:

 I:RouterService:[bas:_ho:396]:a message in type: response with route: FrontendService▸SentSplitPreprocessor▸DictIndexer▸BaseReduceRouter
 W:MessageHandler:[bas:get:237]:cant find handler for message type: <class 'gnes_pb2.IndexResponse'>, fall back to the default handler
 I:MessageHandler:[bas:cal:255]:handling message with _handler_default
 I:RouterService:[hel:__e:301]:handling message takes 0.001 secs

It looks like result should be delivered to frontend (not to _handler_default).

Should I fix it? How can I fix it?

For your information Frontend logs:

I:FrontendService:[fro:__i: 24]:start a frontend with 10 workers
C:FrontendService:[fro:__e: 33]:listening at: 0.0.0.0:5566
I:ZmqClient:[bas:__i: 66]:current libzmq version is 4.3.2,  pyzmq version is 18.1.0
I:ZmqClient:[bas:__i: 78]:input 0.0.0.0:57908    output 0.0.0.0:53463
I:FrontendService:[fro:Str:123]:receive request: 0
I:FrontendService:[fro:Str:126]:send new request into 0 appending tasks
I:FrontendService:[fro:Str:123]:receive request: 1
I:FrontendService:[fro:Str:126]:send new request into 1 appending tasks
I:FrontendService:[fro:Str:123]:receive request: 2
I:FrontendService:[fro:Str:126]:send new request into 2 appending tasks
I:FrontendService:[fro:Str:130]:all requests are sent, waiting for the responses...
I:FrontendService:[fro:get:108]:waiting for 3 responses ...

Thanks in advance

AlexanderKUA commented 4 years ago

It looks like problem was in

Router40:
    image: gnes/gnes:latest-alpine
    command: route --num_part 2 --port_in 57909 --socket_in PULL_BIND --port_out 57908 --socket_out
      PUSH_CONNECT --host_out Frontend00 --yaml_path BaseReduceRouter

this argument is not required or might be wrong --num_part 2

Next issue with client_query no results are displayed.