We are dealing with extremely unstructured data (hand typed) from judicial database, and we need to extract attorney information from more than 4000 cases. We concluded that utilizing a LLM would be our best choice to deal with inconsistencies and noisiness. We tried utilizing latest open-source models like llama3.2, but gpt-4o performed much better when manually tested on some of the cases.
Project Name
Columbia Extradition
Project Type
Data Science / Machine Learning
Team Members + Emails
Junhui Cho (jh00@bu.edu)
Detailed List of Resources Needed
OpenAI API Key
Description of Resource Usage
We are dealing with extremely unstructured data (hand typed) from judicial database, and we need to extract attorney information from more than 4000 cases. We concluded that utilizing a LLM would be our best choice to deal with inconsistencies and noisiness. We tried utilizing latest open-source models like llama3.2, but gpt-4o performed much better when manually tested on some of the cases.
Course Deadlines (if applicable)
Dec 05 2024
@funkyvoong