RedisGraph / redisgraph-py

RedisGraph python client
https://redisgraph.io
BSD 3-Clause "New" or "Revised" License
189 stars 49 forks source link

Response Error when committing large numbers of nodes/edges at once. #75

Open danieljue opened 4 years ago

danieljue commented 4 years ago

I'm running the current docker image in Windows. I discovered this problem when my program encountered a large book and tried to ingest sentences from it. I was able to reproduce the problem with a more simple example in a jupyter notebook.

from string import ascii_letters

r = redis.Redis(host='localhost', port=6379)

def get_mutated_string(s):
    #This is just so we have some way of generating lots of strings different from the last.
    inds = [i for i,_ in enumerate(s) if not s.isspace()]
    sam = random.sample(inds, 3)
    letts =  iter(random.sample(ascii_letters, 3))
    lst = list(s)
    for ind in sam:
        lst[ind] = next(letts)

    return "".join(lst)

#this is just to reset the graph when trying different amounts of nodes.
try:
    rg2.delete()
except:
    pass

rg2 = Graph('bulk_test', r)
nodes=[]
edges=[]
last = None
s = "Jean Piaget"
for i in range(9041):
    new = Node(label='name', properties={'w': s})
    nodes.append(new)
    if last is not None:
        edges.append(Edge(last, 'mutate', new))
    s= get_mutated_string(s)
    last = new

for n in nodes:
    rg2.add_node(n)

for e in edges:
    rg2.add_edge(e)

#Like commit, but sets nodes and edges to empty.  Multiple flushes don't cause duplicates.
rg2.flush()

If the range is only 9040, no errors. If I increase it to 9041 or higher, I get this error:

Error ResponseError: errMsg: Invalid input 'J': expected '.', AND, OR, XOR, NOT, '=~', '=', '<>', '+', '-', '*', '/', '%', '^', IN, CONTAINS, STARTS WITH, ENDS WITH, '<=', '>=', '<', '>', IS NULL, IS NOT NULL, '[', '{', a label, ',' or '}' line: 1, column: 1048604, offset: 1048603 errCtx: ...e{w:"GVNjJCREATE (pkuvboenoi:name{w:"Jean Piaget"}),(tabnluexcb:name{w:"Ye... errCtxOffset: 40

Depending on your machine or the size of the nodes, the number could differ (like 10000 or 20000). I was also able to cause this problem with only one node that had a large property.

Expected behavior: Tell me if the content of the commit is too large Or Handle it gracefully.

swilly22 commented 4 years ago

Hi @danieljue thank you for reporting, you've probably hit RedisGraph parser buffer size limit. Let me check what are the consequences of enlarging this buffer.

danieljue commented 4 years ago

Thanks! I looked on github and maybe there are some ideas that can be borrowed from here:

https://github.com/RedisGraph/redisgraph-bulk-loader/blob/master/redisgraph_bulk_loader/bulk_insert.py

Their actual bulk insert library doesn't work well for my use case, because the data is not in a CSV type of format, but I saw some code in there relating to buffer size.

Perhaps there's a way to allow an "unsafe" commit for the edges, where the code doesn't check for the existence of the nodes. In this way we could insert nodes in smaller batches, and insert edges later on, putting the responsibility of those nodes' existence on the developer.