CaioViktor / Auto-KGQA

4 stars 1 forks source link

IndexError: list index out of range when trying to create a new index using the TBoxIndex class #1

Open ozanbarism opened 1 month ago

ozanbarism commented 1 month ago

I'm encountering an IndexError: list index out of range when trying to create a new index using the TBoxIndex class in the Auto-KGQA project. The error occurs because the SPARQL query returns no terms, resulting in an empty embeddings list. I am using an brick schema model. I believe that might be causing the issue. However, I am not sure why. When I check the localhosted apache server, I can query the model without an issue.

Creating new index... Terms quantity: 0 Traceback (most recent call last): File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/server.py", line 22, in t_box_index = TBoxIndex(endpoint_t_box,normalizer) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 69, in init super().init(path_index, endpoint, normalizer) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 20, in init self.index = self.create() File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 44, in create faiss = FAISS.from_texts(keys, self.embedding_function, metadata) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 931, in from_texts return cls.__from( File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 888, in __from index = faiss.IndexFlatL2(len(embeddings[0])) IndexError: list index out of range

CaioViktor commented 1 month ago

The T-Box index is built according to the classes and properties explicitly defined in the endpoint pointed to in the "ENDPOINT_T_BOX_URL" configuration parameter in the configuration file. For classes, the definitions for owl:Class and rdfs:Class are taken, which are the most common standards. For properties, the definitions for rdf:Property, owl:DatatypeProperty and owl:ObjectProperty are taken. So you can evaluate if there is an explicit definition of classes and properties available in your endpoint.

If you really do not have this explicit definition of the T-Box,

You can try a SPARQL query to build it from the A-Box (data) with a query like:

CONSTRUCT{
    ?class a owl:Class.
}WHERE{
    {
         select ?class (SAMPLE(?subj) as ?s){ 
            ?subj a ?class 
         }group by ?class
    } 
} 
LIMIT 10

Or you can try using some ontology bootstrap tool, such as: https://www.researchgate.net/publication/221467011_OntoMiner_Bootstrapping_and_Populating_Ontologies_from_Domain_Specific_Web_Sites

ozanbarism commented 1 month ago

Hi, I managed to do what you recommended and extracted a T-box by changing the functions a little bit. It looks like this:

TBoxIndex terms: [{'?term': 'https://brickschema.org/schema/Brick#Air_Handler_Unit', '?type': 'class', '?label': 'https://brickschema.org/schema/Brick#Air_Handler_Unit'}, {'?term': 'https://brickschema.org/schema/Brick#Building', '?type': 'class', '?label': 'https://brickschema.org/schema/Brick#Building'}, {'?term': 'https://brickschema.org/schema/Brick#Electric_Meter', '?type': 'class', '?label': 'https://brickschema.org/schema/Brick#Electric_Meter'}, {'?term': 'https://brickschema.org/schema/Brick#Building_Electric_Meter', '?type': 'class', '?label': 'https://brickschema.org/schema/Brick#Building_Electric_Meter'}, {'?term': 'https://brickschema.org/schema/Brick#Damper', '?type': 'class', '?label': 'https://brickschema.org/schema/Brick#Damper'}, {'?term': 'https://brickschema.org/schema/Brick#Floor', '?type': 'class', '?label': 'https://brickschema.org/schema/Brick#Floor'}, {'?term': 'https://brickschema.org/schema/Brick#Cooling_Command', '?type': 'class', '?label': 'https://brickschema.org/schema/Brick#Cooling_Command'}, {'?term': 'https://brickschema.org/schema/Brick#Valve_Command', '?type': 'class', '?label': 'https://brickschema.org/schema/Brick#Valve_Command'}

However, when I ask a question, the resulting query (see below) do not use any of the brick prefixes and instead uses made up prefixes and classes. Why might this be the case?

nPREFIX ex: http://example.org/\nPREFIX sh: http://schema.org/\nSELECT ?unit ?ref ?timeSeries \nWHERE {\n ?unit rdf:type ex:AirHandlerUnit.\n ?unit ex:hasReference ?ref.\n ?ref sh:timeSeries ?timeSeries.\n}\n', 'fragments': '\n', 'sparqls': "['\nPREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#\nPREFIX ex: http://example.org/\nPREFIX sh: http://schema.org/\nSELECT ?unit ?ref ?timeSeries \nWHERE {\n ?unit rdf:type ex:AirHandlerUnit.\n ?unit ex:hasReference ?ref.\n ?ref sh:timeSeries ?timeSeries.\n}\n',

I changed server.py to be able to interact with the file locally instead of using the react web server. The change can be seen below: def process_question(question_id, question): result = chatHandler.process_question(question_id, question) if 'sparql' in result and result['sparql'] is not None: result = dataset.add(result, question_id) dataset.save() return result

if name == "main": question_id = 1 while True: question = input("Enter your question (or type 'finish' to end): ") if question.lower() == 'finish': break result = process_question(str(question_id), question) print(result) question_id += 1

CaioViktor commented 1 month ago

Hello, From your answer, it is possible to notice that the framework is not passing the subgraph to the LLM. In your answer, the "fragments" field is empty. This subgraph represents the context that the LLM uses as a reference for classes and properties. Since nothing was passed to the LLM, it is making it up.

It would be difficult to pinpoint why it is not selecting the subgraph, but you can disable the graph filtering function in the configuration file by changing the "FILTER_GRAPH" variable to False and see if it generates any change.

Another error could be in the phase of matching the input terms to the index terms. I saw that in your index the labels are as the complete URIs, but there should be a function that would extract a human-readable label from the URI (the uri_to_label function in the Endpoint class). Running it on your URI "https://brickschema.org/schema/Brick#Air_Handler_Unit" returned 'Brick Air Handler Unit'

I uploaded a script that I use to run tests , it organizes everything that was generated, so I think it can be useful in your tests.

ozanbarism commented 1 month ago

Hi, thanks for your answer.

I added lines to extract 'Air_Handler_Unit' and so on. However, the fragments are still empty. I mainly just changed these two functions below. Are they missing something that is crucial for creating the fragments? Can you also direct me to which functions that I should check to understand how it is created?

def listTerms(self, language=None, limit=10000):
    print("HERE")
    # Query to fetch classes
    class_query = """
    PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
    PREFIX owl: http://www.w3.org/2002/07/owl#
    PREFIX brick: https://brickschema.org/schema/Brick#

    SELECT DISTINCT ?class
    WHERE {
        ?instance rdf:type ?class.
        FILTER (strstarts(str(?class), "https://brickschema.org/schema/Brick#"))
    }
    """

    # Query to fetch properties
    property_query = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX brick: <https://brickschema.org/schema/Brick#>

    SELECT DISTINCT ?property
    WHERE {
        {
            ?subject ?property ?object.
            FILTER (strstarts(str(?property), "https://brickschema.org/schema/Brick#"))
        }
        UNION
        {
            ?subject ?property ?object.
            FILTER (strstarts(str(?property), "http://www.w3.org/1999/02/22-rdf-syntax-ns#"))
        }
    }
    """

    # Execute the queries
    class_results = self.run_sparql(class_query)
    property_results = self.run_sparql(property_query)

    #print("Class query results:", class_results)
    #print("Property query results:", property_results)

    results = []

    # Process class results
    if class_results:
        for result in class_results:
            if '?class' in result:
                uri = result['?class']
                if not uri in self.labels:
                    self.labels[uri] = []

                label_query = f"""
                PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
                SELECT ?label WHERE {{
                    <{uri}> rdfs:label ?label.
                    {'FILTER(LANG(?label) = "'+language+'")' if language else ''}
                }} LIMIT 1
                """
                label_results = self.run_sparql(label_query)
                label = label_results[0]['label']['value'] if label_results else uri

                self.labels[uri].append([label, "class"])
                self.counts[uri] = 0  # Initialize count as 0 or an appropriate value
                results.append({
                    '?term': uri,
                    '?type': 'class',
                    '?label': label
                })

    # Process property results
    if property_results:
        for result in property_results:
            if '?property' in result:
                uri = result['?property']
                if not uri in self.labels:
                    self.labels[uri] = []

                label = uri  # For properties, use the URI as the label directly

                self.labels[uri].append([label, "property"])
                self.counts[uri] = 0  # Initialize count as 0 or an appropriate value
                results.append({
                    '?term': uri,
                    '?type': 'property',
                    '?label': label
                })

    #print("Results for listterms:", results)
    if not results:
        print("No terms found in TBoxIndex.")

    for term in results:
        if '#' in term['?term']:
            term['?term'] = term['?term'].split('#')[-1]
        if '#' in term['?label']:
            term['?label'] = term['?label'].split('#')[-1]

    return results

def listResources(self, language=None, limit=10000):
    # Query to fetch resources (instances) along with their properties and objects
    resource_query = """
    PREFIX brick: https://brickschema.org/schema/Brick#
    PREFIX ns2: http://buildsys.org/ontologies/bldg11#
    PREFIX ns3: http://buildsys.org/ontologies/bldg11#bldg11.CHW.Pump1_Start/
    PREFIX ns4: http://buildsys.org/ontologies/bldg11#bldg11.CHW.Pump2_Start/
    PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
    PREFIX unit: http://qudt.org/vocab/unit/

    SELECT DISTINCT ?resource ?property ?object
    WHERE {
        ?resource ?property ?object.
        FILTER (
            strstarts(str(?resource), "http://buildsys.org/ontologies/bldg11#") ||
            strstarts(str(?resource), "http://buildsys.org/ontologies/bldg11#bldg11.CHW.Pump1_Start/") ||
            strstarts(str(?resource), "http://buildsys.org/ontologies/bldg11#bldg11.CHW.Pump2_Start/")
        )
        FILTER NOT EXISTS {
            ?resource a owl:Class.
        }
        FILTER NOT EXISTS {
            ?resource a rdf:Property.
        }
    }
    """

    # Execute the query
    resource_results = self.run_sparql(resource_query)

    #print("Resource query results:", resource_results)

    results = []

    # Process resource results
    if resource_results:
        for result in resource_results:
            uri = result['?resource']
            property = result['?property']
            obj = result['?object']

            if uri not in self.labels:
                self.labels[uri] = []

            # In this case, we use the URI itself as the label since there are no labels
            label = uri

            self.labels[uri].append([label, "resource"])
            self.counts[uri] = 0  # Initialize count as 0 or an appropriate value

            results.append({
                '?term': uri,
                '?type': 'resource',
                '?label': label,
                '?property': property,
                '?object': obj
            })
    #print("Results:", results)
    if not results:
        print("No resources found in ABox.")

    for term in results:
        if '#' in term['?term']:
            term['?term'] = term['?term'].split('#')[-1]
        if '#' in term['?label']:
            term['?label'] = term['?label'].split('#')[-1]

    return results
ozanbarism commented 1 month ago

Hi again, i tried to test it with the ontology_example.ttl file you have in the repo. I re-cloned your code and the only change i made was to the Endpoint.py file so that the ontology can be read from a local document. However, the ontology_example.ttl resulted in the following error when i run teste.py

this is the config file. ENDPOINT_T_BOX_URL = "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/Demo/ontology_example.ttl" ENDPOINT_A_BOX_URL = None

Creating new index... Terms quantity: 0 Traceback (most recent call last): File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/teste.py", line 22, in t_box_index = TBoxIndex(endpoint_t_box,normalizer) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 69, in init super().init(path_index, endpoint, normalizer) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 20, in init self.index = self.create() File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 44, in create faiss = FAISS.from_texts(keys, self.embedding_function, metadata) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 930, in from_texts embeddings = embedding.embed_documents(texts) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 98, in embed_documents embeddings = self.client.encode( File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 565, in encode if all_embeddings[0].dtype == torch.bfloat16: IndexError: list index out of range

CaioViktor commented 1 month ago

Hi, thanks for your answer.

I added lines to extract 'Air_Handler_Unit' and so on. However, the fragments are still empty. I mainly just changed these two functions below. Are they missing something that is crucial for creating the fragments? Can you also direct me to which functions that I should check to understand how it is created?

def listTerms(self, language=None, limit=10000):
    print("HERE")
    # Query to fetch classes
    class_query = """
    PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
    PREFIX owl: http://www.w3.org/2002/07/owl#
    PREFIX brick: https://brickschema.org/schema/Brick#

    SELECT DISTINCT ?class
    WHERE {
        ?instance rdf:type ?class.
        FILTER (strstarts(str(?class), "https://brickschema.org/schema/Brick#"))
    }
    """

    # Query to fetch properties
    property_query = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX brick: <https://brickschema.org/schema/Brick#>

    SELECT DISTINCT ?property
    WHERE {
        {
            ?subject ?property ?object.
            FILTER (strstarts(str(?property), "https://brickschema.org/schema/Brick#"))
        }
        UNION
        {
            ?subject ?property ?object.
            FILTER (strstarts(str(?property), "http://www.w3.org/1999/02/22-rdf-syntax-ns#"))
        }
    }
    """

    # Execute the queries
    class_results = self.run_sparql(class_query)
    property_results = self.run_sparql(property_query)

    #print("Class query results:", class_results)
    #print("Property query results:", property_results)

    results = []

    # Process class results
    if class_results:
        for result in class_results:
            if '?class' in result:
                uri = result['?class']
                if not uri in self.labels:
                    self.labels[uri] = []

                label_query = f"""
                PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
                SELECT ?label WHERE {{
                    <{uri}> rdfs:label ?label.
                    {'FILTER(LANG(?label) = "'+language+'")' if language else ''}
                }} LIMIT 1
                """
                label_results = self.run_sparql(label_query)
                label = label_results[0]['label']['value'] if label_results else uri

                self.labels[uri].append([label, "class"])
                self.counts[uri] = 0  # Initialize count as 0 or an appropriate value
                results.append({
                    '?term': uri,
                    '?type': 'class',
                    '?label': label
                })

    # Process property results
    if property_results:
        for result in property_results:
            if '?property' in result:
                uri = result['?property']
                if not uri in self.labels:
                    self.labels[uri] = []

                label = uri  # For properties, use the URI as the label directly

                self.labels[uri].append([label, "property"])
                self.counts[uri] = 0  # Initialize count as 0 or an appropriate value
                results.append({
                    '?term': uri,
                    '?type': 'property',
                    '?label': label
                })

    #print("Results for listterms:", results)
    if not results:
        print("No terms found in TBoxIndex.")

    for term in results:
        if '#' in term['?term']:
            term['?term'] = term['?term'].split('#')[-1]
        if '#' in term['?label']:
            term['?label'] = term['?label'].split('#')[-1]

    return results

def listResources(self, language=None, limit=10000):
    # Query to fetch resources (instances) along with their properties and objects
    resource_query = """
    PREFIX brick: https://brickschema.org/schema/Brick#
    PREFIX ns2: http://buildsys.org/ontologies/bldg11#
    PREFIX ns3: http://buildsys.org/ontologies/bldg11#bldg11.CHW.Pump1_Start/
    PREFIX ns4: http://buildsys.org/ontologies/bldg11#bldg11.CHW.Pump2_Start/
    PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
    PREFIX unit: http://qudt.org/vocab/unit/

    SELECT DISTINCT ?resource ?property ?object
    WHERE {
        ?resource ?property ?object.
        FILTER (
            strstarts(str(?resource), "http://buildsys.org/ontologies/bldg11#") ||
            strstarts(str(?resource), "http://buildsys.org/ontologies/bldg11#bldg11.CHW.Pump1_Start/") ||
            strstarts(str(?resource), "http://buildsys.org/ontologies/bldg11#bldg11.CHW.Pump2_Start/")
        )
        FILTER NOT EXISTS {
            ?resource a owl:Class.
        }
        FILTER NOT EXISTS {
            ?resource a rdf:Property.
        }
    }
    """

    # Execute the query
    resource_results = self.run_sparql(resource_query)

    #print("Resource query results:", resource_results)

    results = []

    # Process resource results
    if resource_results:
        for result in resource_results:
            uri = result['?resource']
            property = result['?property']
            obj = result['?object']

            if uri not in self.labels:
                self.labels[uri] = []

            # In this case, we use the URI itself as the label since there are no labels
            label = uri

            self.labels[uri].append([label, "resource"])
            self.counts[uri] = 0  # Initialize count as 0 or an appropriate value

            results.append({
                '?term': uri,
                '?type': 'resource',
                '?label': label,
                '?property': property,
                '?object': obj
            })
    #print("Results:", results)
    if not results:
        print("No resources found in ABox.")

    for term in results:
        if '#' in term['?term']:
            term['?term'] = term['?term'].split('#')[-1]
        if '#' in term['?label']:
            term['?label'] = term['?label'].split('#')[-1]

    return results

Hello, you changed the function to get the list of classes (based on triples with rdf:type) and properties (every second element of the triple) and then get the labels for each of the URIs found previously, right? So I think I understand why in your resulting list from the previous index the labels were the same as the URIs. In the snippet below you try to get the label, but from what I've noticed, there are no labels explicitly defined in your KG, so this doesn't work:

label_query = f"""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label WHERE {{
<{uri}> rdfs:label ?label.
{'FILTER(LANG(?label) = "'+language+'")' if language else ''}
}} LIMIT 1
"""
label_results = self.run_sparql(label_query)
label = label_results[0]['label']['value'] if label_results else uri

You can do the following to transform the URI into a human-readable label humans:

label = self.uri_to_label(result['?label'])
type_label = "URI"
self.labels[uri].append([label,type_label ])

The same reasoning can be applied to all labels.

In the case of the results variables that are returned in each function, they must follow the return structure of each original SPARQL:

results.append({
"?term":uri,
"?type":"class",
"?label": label,
"?property": type_label,
"?qtd": 0
})
results.append({
"?term":uri,
"?type":"property",
"?label": label,
"?property": type_label,
"?qtd": 0
})
CaioViktor commented 1 month ago

Hi again, i tried to test it with the ontology_example.ttl file you have in the repo. I re-cloned your code and the only change i made was to the Endpoint.py file so that the ontology can be read from a local document. However, the ontology_example.ttl resulted in the following error when i run teste.py

this is the config file. ENDPOINT_T_BOX_URL = "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/Demo/ontology_example.ttl" ENDPOINT_A_BOX_URL = None

Creating new index... Terms quantity: 0 Traceback (most recent call last): File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/teste.py", line 22, in t_box_index = TBoxIndex(endpoint_t_box,normalizer) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 69, in init super().init(path_index, endpoint, normalizer) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 20, in init self.index = self.create() File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 44, in create faiss = FAISS.from_texts(keys, self.embedding_function, metadata) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 930, in from_texts embeddings = embedding.embed_documents(texts) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 98, in embed_documents embeddings = self.client.encode( File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 565, in encode if all_embeddings[0].dtype == torch.bfloat16: IndexError: list index out of range

Hi, the code already has support for running local files, I don't know how your changes were, but could you try the standard version of the code? I suggest downloading it again and changing only the configuration file. This reading of local files is done in the run_sparql() function in the following snippet

#Enpoint is a local file
results= self.run_sparql_rdflib(query)
encoder = JSONResultSerializer(results)
output = io.StringIO()
encoder.serialize(output)
print(file=output)
for result in results:
result_item = {}
for var in results.vars:
result_item["?"+var] = str(result[var])
result_set.append(result_item)
ozanbarism commented 1 month ago

Hi again, i tried to test it with the ontology_example.ttl file you have in the repo. I re-cloned your code and the only change i made was to the Endpoint.py file so that the ontology can be read from a local document. However, the ontology_example.ttl resulted in the following error when i run teste.py this is the config file. ENDPOINT_T_BOX_URL = "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/Demo/ontology_example.ttl" ENDPOINT_A_BOX_URL = None Creating new index... Terms quantity: 0 Traceback (most recent call last): File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/teste.py", line 22, in t_box_index = TBoxIndex(endpoint_t_box,normalizer) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 69, in init super().init(path_index, endpoint, normalizer) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 20, in init self.index = self.create() File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 44, in create faiss = FAISS.from_texts(keys, self.embedding_function, metadata) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 930, in from_texts embeddings = embedding.embed_documents(texts) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 98, in embed_documents embeddings = self.client.encode( File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 565, in encode if all_embeddings[0].dtype == torch.bfloat16: IndexError: list index out of range

Hi, the code already has support for running local files, I don't know how your changes were, but could you try the standard version of the code? I suggest downloading it again and changing only the configuration file.

Hi! I have tried it with the original code but that results in the following error. I have done it by downloading the whole repo again and only changing the configuration file.

Traceback (most recent call last): File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/teste.py", line 22, in t_box_index = TBoxIndex(endpoint_t_box,normalizer) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 69, in init super().init(path_index, endpoint, normalizer) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 19, in init self.loadTerms(endpoint) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/index/vector_distance_index.py", line 72, in loadTerms self.terms = endpoint.listTerms() File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/sparql/Endpoint.py", line 319, in listTerms qtd = self.unpackNumber(self.run_sparql(count_query)[0]["?qtd_max"]) TypeError: 'NoneType' object is not subscriptable

CaioViktor commented 1 month ago

I made my own parser to output query results using RDFlib, as theirs had bugs. So I think it should now work using local files. You need to install the pyton lib xmltodict==0.13.0

ozanbarism commented 1 month ago

I made my own parser to output query results using RDFlib, as theirs had bugs. So I think it should now work using local files. You need to install the pyton lib xmltodict==0.13.0

Hi, i installed the library and run the code with only changes to config file. It returns nonetype errors still. I have added some debug statements for running teste.py . It seems like in some conditions the sparql queries return " list indices must be integers or slices, not str". Is this normal?

Outgoing triple: {'?p': 'http://www.example.lirb.com/knows', '?o': 'http://www.example.lirb.com/SomeOne'} Outgoing triple: {'?p': 'http://www.arida.ufc.br/ontology/timeline/has_timeLine', '?o': 'http://www.example.lirb.com/Timeline/John'} SPARQL query for ingoing properties: SELECT DISTINCT ?s ?p WHERE{ ?s ?p http://www.example.lirb.com/John. FILTER(?p != http://www.w3.org/1999/02/22-rdf-syntax-ns#type) FILTER(?s != http://www.w3.org/2002/07/owl#Nothing) } Running SPARQL query: SELECT DISTINCT ?s ?p WHERE{ ?s ?p http://www.example.lirb.com/John. FILTER(?p != http://www.w3.org/1999/02/22-rdf-syntax-ns#type) FILTER(?s != http://www.w3.org/2002/07/owl#Nothing) } SPARQL query results: [{'?s': 'http://www.example.lirb.com/JohnDoe', '?p': 'http://www.w3.org/2002/07/owl#sameAs'}, {'?s': 'http://www.example.lirb.com/SomeOne', '?p': 'http://www.example.lirb.com/knows'}] Ingoing triple: {'?s': 'http://www.example.lirb.com/JohnDoe', '?p': 'http://www.w3.org/2002/07/owl#sameAs'} Ingoing triple: {'?s': 'http://www.example.lirb.com/SomeOne', '?p': 'http://www.example.lirb.com/knows'} Visiting http://www.example.lirb.com/Timeline/John SPARQL query for outgoing properties: SELECT DISTINCT ?p ?o WHERE{ http://www.example.lirb.com/Timeline/John ?p ?o. FILTER(?o != http://www.w3.org/2000/01/rdf-schema#Resource) FILTER(?o != http://www.w3.org/2002/07/owl#Thing) } Running SPARQL query: SELECT DISTINCT ?p ?o WHERE{ http://www.example.lirb.com/Timeline/John ?p ?o. FILTER(?o != http://www.w3.org/2000/01/rdf-schema#Resource) FILTER(?o != http://www.w3.org/2002/07/owl#Thing) } Exception on run_sparql: list indices must be integers or slices, not str Warning: No triples found for http://www.example.lirb.com/Timeline/John Visiting http://www.example.lirb.com/SomeOne SPARQL query for outgoing properties: SELECT DISTINCT ?p ?o WHERE{ http://www.example.lirb.com/SomeOne ?p ?o. FILTER(?o != http://www.w3.org/2000/01/rdf-schema#Resource) FILTER(?o != http://www.w3.org/2002/07/owl#Thing) } Running SPARQL query: SELECT DISTINCT ?p ?o WHERE{ http://www.example.lirb.com/SomeOne ?p ?o. FILTER(?o != http://www.w3.org/2000/01/rdf-schema#Resource) FILTER(?o != http://www.w3.org/2002/07/owl#Thing) } Exception on run_sparql: list indices must be integers or slices, not str Warning: No triples found for http://www.example.lirb.com/SomeOne Traceback (most recent call last): File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/teste.py", line 46, in ttl = qa.getRelevantGraph(question) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/core/QuestionHandler.py", line 61, in getRelevantGraph filter_triples = Filter_Triples(triples,self.embedding_function,relevance_threshold = RELEVANCE_THRESHOLD, max_hits_rate=MAX_HITS_RATE) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/sparql/Filter_Triples.py", line 12, in init self.endpoint = Endpoint.from_rdflib_in_string(triples) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/sparql/Endpoint.py", line 57, in from_rdflib_in_string graph = getGraph(triples) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/sparql/Utils.py", line 9, in getGraph g.parse(str_in, format='n3') File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/rdflib/graph.py", line 1501, in parse raise se File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/rdflib/graph.py", line 1492, in parse parser.parse(source, self, **args) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 2058, in parse TurtleParser.parse(self, source, conj_graph, encoding, turtle=False) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 2021, in parse p.loadStream(stream) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 479, in loadStream return self.loadBuf(stream.read()) # Not ideal File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 485, in loadBuf self.feed(buf) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 511, in feed i = self.directiveOrStatement(s, j) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 532, in directiveOrStatement return self.checkDot(argstr, j) File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1214, in checkDot self.BadSyntax(argstr, j, "expected '.' or '}' or ']' at end of statement") File "/Users/ozanbaris/Documents/GitHub/Auto-KGQA/API/venv/lib/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1730, in BadSyntax raise BadSyntax(self._thisDoc, self.lines, argstr, i, msg) rdflib.plugins.parsers.notation3.BadSyntax:

CaioViktor commented 1 month ago

Hi, there were some unexpected cases in the RDFlib return. I think it is now working correctly, as I was able to run the script to create the indexes and the test script (returning the ChatGPT response). Thanks for the feedback, if you have anything else, just let me know.

Note: There is now a problem with the line:

self.index = FAISS.load_local(self.path_index, self.embedding_function,allow_dangerous_deserialization=True)

The parameter "allow_dangerous_deserialization=True" is not recognized. However, there was no problem with this before. Have you encountered the same problem? If so, simply delete this parameter from the call.

ozanbarism commented 2 weeks ago

Hi Viktor, I am able to run the ontology_example.ttl now. I have also managed to get my knowledge graph in two files: Brick.ttl (T-box), vm3a.ttl (A-box). However, I get the result of the query which does not have any '?qtd' in it. I am using the listTerms as is. This is result printed in the local file condition of run_sparql, right after xmltodict. As you can see, there are no 'qtd' terms. Why might this be the case?

{'@name': 'label', 'literal': 'hasExternalReference'}]}, {'binding': [{'@name': 'term', 'uri': 'https://brickschema.org/schema/Brick#hasPart'}, {'@name': 'type', 'literal': 'property'}, {'@name': 'property', 'uri': 'http://www.w3.org/2000/01/rdf-schema#label'}, {'@name': 'label', 'literal': 'Has part'}]}, {'binding': [{'@name': 'term', 'uri': 'https://brickschema.org/schema/Brick#hasUnit'}, {'@name': 'type', 'literal': 'property'}, {'@name': 'property', 'uri': 'http://www.w3.org/2000/01/rdf-schema#label'}, {'@name': 'label', 'literal': 'Has unit'}]}, {'binding': [{'@name': 'term', 'uri': 'https://brickschema.org/schema/Brick#hasTag'}, {'@name': 'type', 'literal': 'property'}, {'@name': 'property', 'uri': 'http://www.w3.org/2000/01/rdf-schema#label'}, {'@name': 'label', 'literal': 'Has tag'}]}]}}}

CaioViktor commented 2 weeks ago

Hello ozanbarism,

You can leave your KG in a single file, since it is probably small.

About the '?qtd', you said you are using the original query, right?

What you showed is the direct result of executing the query, without going through the parser? Because it is in a different format after the parser. So the problem may have been in RDFlib itself. Is this giving an error? Have you tried to run it as is? Using the example KG, did it work correctly?

I will be making another commit with some changes to the parser, but I don't know if it will change anything in your error.

I suggest putting all the triples in a single file and testing. If possible, could you try uploading your KG to a triplestore (e.g., virtuoso or graphdb) to see if the error also occurs? I have already run the framework on 5 different KGs (example KG, ORKG, DBpedia and two private KGs) and I had not encountered these problems.

But you can set this variable to 1 for all resources, it only serves to give a greater weight to resources that have more triples (higher ranking)

ozanbarism commented 4 days ago

Hi,

I have made some changes the code and I was able to make it work. However, I am having this issue where the relevance score is negative. I wonder if the things I am pushing are wrong. Do

self.relevance_threshold: 0.1 hit[1]: -0.011848136351717331 Nothing with minimum relevance! Missing nodes in the graph: [rdflib.term.URIRef('https://brickschema.org/schema/Brick#Air_Temperature_Integral_Time_Parameter'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Outdoor_Area'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Outside_Air_Temperature_Setpoint'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Outside_Air'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Air_Temperature_Setpoint'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Energy_Usage_Sensor'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Dedicated_Outdoor_Air_System_Unit'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Air'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Air_Temperature_Sensor'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Outside'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Outside_Air_Temperature_Sensor'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Temperature_Parameter'), rdflib.term.URIRef('https://brickschema.org/schema/Brick#Energy_System')] KG Given to LLM: @prefix brick: https://brickschema.org/schema/Brick# . @prefix owl: http://www.w3.org/2002/07/owl# . @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .

http://buildsys.org/ontologies/VM3A#AHU08 a brick:Air_Handler_Unit ; brick:feeds http://buildsys.org/ontologies/VM3A#VAVRM1207A_LAB, http://buildsys.org/ontologies/VM3A#VAVRM1207_LAB, http://buildsys.org/ontologies/VM3A#VAVRM1300A_LAB, http://buildsys.org/ontologies/VM3A#VAVRM1300C_1_LAB, http://buildsys.org/ontologies/VM3A#VAVRM1300C_2_LAB, http://buildsys.org/ontologies/VM3A#VAVRM1300D_1_LAB, http://buildsys.org/ontologies/VM3A#VAVRM1300D_2_LAB, http://buildsys.org/ontologies/VM3A#VAVRM1300H_LAB,

   The results at the end of listTerms looks like this. Is this correct? 
image

Also, is the relevance function using terms or labels to find the matches or does it use URIs?