Yelp / pyleus

Pyleus is a Python framework for developing and launching Storm topologies.
Apache License 2.0
404 stars 113 forks source link

bolt died because the read_tuple() has TypeError: 'int' object has no attribute '__getitem__' #109

Open n1epan opened 9 years ago

n1epan commented 9 years ago

I have a bolt A that continuously emits a namedtuple that contains a list that could have thousands of ids in it.

Fields = namedtuple("Fields", "table action ids"), ids is a list that contains thousands of items in it.

The bolt B in the downstream have the error below. My guess is that the namedtuple is overflowing the sys.stdin. Is this possible? What I should do in this case ?

  4 topology:
  5     - spout:
  6         name: spoutA
  7         module: folder.spoutA
  8 
  9     - bolt:
 10         name: boltA
 11         module: folder.boltA
 12         groupings:
 13             - shuffle_grouping:
 14                 component: spoutA
 15 
 16     - bolt:
 17         name: boltB
 18         module: folder.boltB
 19         groupings:
 20             - shuffle_grouping:
 21                 component: boltA

I further checked what it is returned in read_tuple() to cmd. It is 56. 03/31/2015 03:55:13 PM - pyleus.storm.component - read_tuple - INFO: 56

30 03/31/2015 02:56:29 PM - pyleus.storm.component - run - ERROR: Exception in bolt.run
 31 Traceback (most recent call last):                                                                                      
 32   File "/usr/lib/python2.7/site-packages/pyleus/storm/component.py", line 233, in run
 33     self.run_component()                                                                                                
 34   File "/usr/lib/python2.7/site-packages/pyleus/storm/bolt.py", line 45, in run_component
 35     tup = self.read_tuple()                                                                                             
 36   File "/usr/lib/python2.7/site-packages/pyleus/storm/component.py", line 291, in read_tuple
 37     cmd['id'], cmd['comp'], cmd['stream'], cmd['task'], cmd['tuple'])                                                   
 38 TypeError: 'int' object has no attribute '__getitem__'
poros commented 9 years ago

Can you try to run it locally and to debug the issue? I run a topology with some tuples in the order of the MB some time ago and it seemed to work fine (with some Storm tweaks).

n1epan commented 9 years ago

The issues is because the msgs are too long for msgpack to handle. After I switch to json, it worked.

poros commented 9 years ago

Ah, so this is actually a bug/limitation of our msgpack serializer. Good to know, thanks