erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python
http://ann-benchmarks.com
MIT License
4.88k stars 735 forks source link

i need some help about connect my database,thanks!! #478

Closed frg01 closed 10 months ago

frg01 commented 11 months ago

I have a question. I want to ask someone who can answer me. I want to deploy the database locally on Linux and use ann-benchmarks to test my self-developed vector database. But I don’t know how to connect ann-benmarks to my relational vector database. , where should I configure it in the python code? Maybe I should write some code to connect, right? I'm new to databases, so my question may seem a bit silly, but I'm hoping someone can shed some light on my confusion, thank you!

maumueller commented 11 months ago

Hi @frg01. You question is a bit open to give a specific answer, but I suggest that you take a look at the code for elasticsearch https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/algorithms/elasticsearch/module.py. You will have to write all the code to setup the connection in your own module.py file, setting up the connection in the init, building the index in fit and search for vectors in query. It's usually easiest to write the code in local mode without the docker interface. Use python run.py --local ... instead of python run.py ....

Hope that helps.

frg01 commented 10 months ago

Thank you very much for your guidance.

frg01 commented 10 months ago

I have another question. I deployed the Ann-Benchmarks project on the Ubantu system. When processing the fit() function, I don’t know how to handle the passed X (numpy.ndarray type) parameter. When I connect to my database and insert data individually( cur.execute()), the project can run successfully, but when I insert data in batches(cur.executemany()), errors always occur. I'm really about to collapse. I really don't know how to use the numpy package.

The following code can be executed successfully

import psycopg2
text_data = []
for i, x in enumerate(X):
    c = x.tolist()
    em = json.dumps(c)
    ems = "\'" + em + "\'"
    id = json.dumps(i)
    try:
        cur.execute(f"insert into items (id,embedding) values ({id},{ems})")
        conn.commit()
    except Exception as e:                                                                                                                                      
        print("Insert failed: ", e)

The following code doesn't work

res = []
sql = "insert into items (id,embedding) values ( %s , %s )"
for i, x in enumerate(X):
    id = json.dumps(i)
    c = x.tolist()
    em = json.dumps(c)
    ems = "\'" + em + "\'"
    temp = (id,x)
    res.append(temp)

    if int(i) % 4 == 0 and int(i) >= 3:
        try:
            cur.executemany(sql,res)
        except Exception as e:
            print(e, "------")
        finally:
            conn.commit()
            res = []
cur.executemany(sql,res)
conn.commit()

I try to use many function ,but don't know how to slove it. For example : numpy.ndarray:tostring() tolist() json:dumps().... How do I do this? Help me plzz.