BU-Spark / bu-spark

The start of Spark! Tech Resources
MIT License
4 stars 16 forks source link

Request for Spark! Tech Resources for Columbia Extradition #86

Open jh000107 opened 5 days ago

jh000107 commented 5 days ago

Project Name

Columbia Extradition

Project Type

Data Science / Machine Learning

Team Members + Emails

Junhui Cho (jh00@bu.edu)

Detailed List of Resources Needed

OpenAI API Key

Description of Resource Usage

We are dealing with extremely unstructured data (hand typed) from judicial database, and we need to extract attorney information from more than 4000 cases. We concluded that utilizing a LLM would be our best choice to deal with inconsistencies and noisiness. We tried utilizing latest open-source models like llama3.2, but gpt-4o performed much better when manually tested on some of the cases.

Course Deadlines (if applicable)

Dec 05 2024

@funkyvoong

jh000107 commented 5 days ago

It costs about $0.01568627451 on average per case using gpt-4o, and we have about 3400 cases left.

funkyvoong commented 4 days ago

@jh000107 I added you to our OpenAI Org, please accept the invite. Let me know if there are any issues.