cmavro / GNN-RAG

GNN-RAG: Graph Neural Retrieval for Large Language Modeling Reasoning
143 stars 32 forks source link

Why we are using the test.json when we storing the result from gnn as test .info #6

Open Lavishgurjar85 opened 4 days ago

Lavishgurjar85 commented 4 days ago

def load_gnn_rag(g_data_file, g_data_file2=None): data_file_d = {} data_file_gnn = {}

data_file = os.path.dirname(g_data_file) + "/test.json"
with open(data_file) as f_in, open(g_data_file) as fg:
    for line, lineg in (zip(f_in, fg)):
        line = json.loads(line)
        lineg = json.loads(lineg)

        data_file_d[line["id"]] = line
        data_file_gnn[line["id"]] = lineg
    print("ok1")
if g_data_file2 is not None:
    data_file = os.path.dirname(g_data_file2) + "/test.json"
    with open(data_file) as f_in, open(g_data_file2) as fg:
        for line, lineg in (zip(f_in, fg)):
            line = json.loads(line)
            lineg = json.loads(lineg)

            cand1 = data_file_gnn[line["id"]]["cand"]
            cand2 =  lineg["cand"]

            for c2 in cand2: #c[0] entity c[1] score
                found=False
                for c1 in cand1:
                    if c2[0] == c1[0]:
                        if c2[1] > c1[1]: c1[1] = c2[1]
                        found=True
                        break
                if not found:
                    cand1.append(c2)
            cand1 = sorted(cand1, key=lambda x: x[1], reverse=True)
            data_file_gnn[line["id"]]["cand"] = cand1
        data_file_gnn[line["id"]].update({"cand2": lineg["cand"]})
        print("ok2")

return data_file_gnn

In this why we used test.json instead of test.info Also if we use test.info it does'nt contains ids

cmavro commented 4 days ago

You need to rename move test.info and test.json (from the gnn data folder) to the corresponding folder (see here).

We will add new automatic scripts at the near future.

Lavishgurjar85 commented 4 days ago

The thing is there are two files we are using data_file = "/test.json" g_data_file = "/test.info"

if I only use test.info by renaming it too test.json The problem occurs is cand1 = data_file_gnn[line["id"]]["cand"] we require "id" for this but in test.info the data is {"question": "lou seal is the mascot for the team that last won the world series when ? ", "0": {}, "1": {}, "answers": ["m.0117q3yz"], "precison": 0.3333333333333333, "recall": 1.0, "f1": 0.5, "hit": 1.0, "cand": [["m.0117q3yz", 0.34248173236846924], ["m.09gnk2r", 0.328049898147583], ["m.0ds8qct", 0.32790225744247437]]} there is no "id"

cmavro commented 4 days ago

Correct, I updated the previous message. You need both test.info and test.json from the gnn data to the new folder:

image
Lavishgurjar85 commented 4 days ago

Can you please specify where this test.json files are Actually in the repository only test.info files are present