RedisGraph / redisgraph-py

RedisGraph python client
https://redisgraph.io
BSD 3-Clause "New" or "Revised" License
189 stars 49 forks source link

fails to commit large batch inserts #119

Closed gomesian closed 3 years ago

gomesian commented 3 years ago

Hi Team,

Was trying to understand the limits of a batch inserts with this library. Seams somewhere between 6-8k nodes (3 to 4k edges) fails with:

image

I don't mind breaking up and batching updates of ~10k required node/edges every hour, but need help understanding what is breaking here - so I can configure a safe batch size based on varying property sizes for insert.

Also wondering the use of _connectionpool, and/or maybe I should try _unix_socketpath since script runs on server. maybe the use of this will allow great bath updates. I don't see documentation on this.

note: I tried, and the the redisgraph-bulk-loader.py method doesn't suit me either - as I need to constantly update and prune existing graph.

Any help / pointers appreciated!

quick and dirty example:

r = redis.Redis(host=HOST, port=PORT, db=0, socket_timeout=3000, connection_pool=None, charset='utf-8', errors='strict', unix_socket_path=None)

redis_graph = Graph('large', r)

for x in range(4000):

    label1= 'src-' + str(x)
    label2= 'dst-' + str(x)

    label1 = Node(label='person', properties={'name': label1, 'age': 33, 'gender': 'male', 'status': 'single'})
    redis_graph.add_node(label1)

    label2 = Node(label='person', properties={'name': label2, 'age': 33, 'gender': 'male', 'status': 'single'})
    redis_graph.add_node(label2)

    edge = Edge(label1, 'visited', label2, properties={'purpose': 'pleasure'})
    redis_graph.add_edge(edge)

redis_graph.commit()
jeffreylovitz commented 3 years ago

Hi @gomesian,

This error is caused by a buffer size limit in RedisGraph's parser utility. A workaround can be found here - https://github.com/RedisGraph/RedisGraph/issues/1486#issuecomment-742581130 . Alternately, you can create entities in a series of smaller batched queries by periodically making calls to redis_graph.flush() in your create loop.

gomesian commented 3 years ago

Thanks, batching and flush() makes sense then