UW-xDD / text2graph_llm

An experimental API endpoint to convert text to knowledge graph triplets.
MIT License
2 stars 1 forks source link

text2graph_llm (USGS project)

text2graph_llm is an experimental tool that uses Large Language Models (LLMs) to convert text into structured graph representations by identifying and extracting relationship triplets. This repository is still in development and may change frequently.

System overview

system overview

Features

Demo

Explore our interactive demo

Quick start for using API endpoint

We are using the cached LLM graph for faster processing. However, the hydration step (retrieving entity details) is still processed in real time; we are working on caching this step as well.

import requests

API_ENDPOINT = "http://cosmos0002.chtc.wisc.edu:4510/llm_graph"
API_KEY = "Email jason.lo@wisc.edu to request an API key if you need access."

headers = {"Content-Type": "application/json", "Api-Key": API_KEY}
data = {
    "query": "Gold mines in Nevada.",
    "top_k": 1,
    "ttl": True,  # Return in TTL format or not
    "hydrate": False,  # Get additional data from external services (e.g., GPS). Due to rate limit, it is very slow. Do not use with top_k > 3
}

response = requests.post(API_ENDPOINT, headers=headers, json=data)
response.raise_for_status()
print(response.json())

For convenient, you can use this notebook

Links

For developers ## Instructions to developers **Steps to setup environment:** 1. Open the project in VSCode. 2. Press `F1`, select `Reopen in Container` to set up the dev environment using the [dev-container](.devcontainer/devcontainer.json). 3. Copy the `.env` file from the shared Google Drive to the project root. 4. Copy the extracted graph cache data from Google Drive to `app_data/`. 5. Run `docker-compose up` in bash to deploy locally. **Running Batch Inference on CHTC:** 1. **Update `text2graph_llm_chtc` Container**: Ensure the base package is updated and pushed to ghcr.io. See the [package script](scripts/package.sh) for details. 2. **Create ID Pickle**: Ensure the data package storing the document IDs (e.g., `./chtc/geoarchive_paragraph_ids.pkl`) is up-to-date. 3. **Add .env File**: Ensure `./chtc/.env` contains the required credentials. See this [example](chtc/example.env). 4. **Initialize Turso DB**: Run `hard_reset` to initialize Turso DB in `./chtc/db.py`. 5. **Login to CHTC Submit Node**: Login to your CHTC submit node (e.g., `ap2001.chtc.wisc.edu`). 6. **Update Test Job Container Name**: Update the container name in the test job [here](chtc/debug_job.sub) and run `condor_submit`. 7. **Verify Turso Data Reception**: Ensure Turso is properly receiving data. 8. **Submit Full Job**: Submit the job using this [job file](chtc/job.sub) and update the Docker container in this file.