UW-xDD / text2graph_llm

An experimental API endpoint to convert text to knowledge graph triplets.
MIT License
2 stars 1 forks source link

Preprocess triplet extraction #32

Closed JasonLo closed 5 months ago

JasonLo commented 5 months ago

Batch preprocessing triplet extraction using HTCondor:

  1. Retrieve all paragraph IDs from the GeoArchive in Weaviate, approximately 2 million in total.
  2. Assign about 2000 IDs to each HTCondor job for distributed processing.
  3. Run the triplet extraction pipeline for each job and store the results in a temporary /staging database.
JasonLo commented 5 months ago

Finally working somewhat smoothly, after changing the backend to Turso.