HeardLibrary / vandycite

0 stars 0 forks source link

Experiment with ChatGPT API #114

Open baskaufs opened 7 months ago

baskaufs commented 7 months ago

The Python quickstart is at https://platform.openai.com/docs/quickstart?context=python

baskaufs commented 6 months ago

New information about models (from a 2024-02-09 email):

Last week, we launched gpt-3.5-turbo-0125, our latest GPT-3.5 Turbo model. Along with various improvements, this model also has lower pricing: input prices for the new model are reduced by 50% to $0.0005 /1K tokens and output prices are reduced by 25% to $0.0015 /1K tokens.

If your code specifies gpt-3.5-turbo or gpt-3.5-turbo-16k (the pinned model alias), your model will be automatically updated to gpt-3.5-turbo-0125 on Friday, February 16 and you’ll receive the new, lower pricing.

If for any reason you'd like to continue using the old versions of GPT-3.5 Turbo, you can do so by updating your code to specify gpt-3.5-turbo-0613 or gpt-3.5-turbo-16k-0613 as the model parameter. Please note that the gpt-3.5-turbo-0613 and gpt-3.5-turbo-16k-0613 models will be shut down on June 13th, 2024 as part of our regular model upgrade cycles.

baskaufs commented 6 months ago

Chat GPT 4.0 Vision: https://platform.openai.com/docs/guides/vision

baskaufs commented 4 months ago

Comments from Daniel about prompts:

Once you've accepted the invitation to the slavesocieties GitHub org, you should be able to access the repo here.

The instruction sets that I've constructed are here, examples for NLP are here, and example text for HTR is here.

The functions that compile those instruction sets and examples for specific use cases are in this file. The function names related to each of these tasks are collect_instructions, generate_training_data, and generate_block_training_data, respectively. You can refer to the last of those functions to see the construction of the URLs for images corresponding to the HTR example text (and those URLs should be publicly accessible).

I know that what you're most interested in is how I use those instructions and examples to build a conversation history. You can see examples of this in any of these files. One other thing that you'll see there and that might be useful for you and Emily as well is constraining the model to produce JSON output by specifying a response format in the API call (note that in order for this to work you also need to explicitly refer to "JSON" in the first message in the conversation).

baskaufs commented 4 months ago

From Emily:

The most recent version of the code should still be in the openai_ner.ipynb file (in label_analysis/chat_gpt); despite the file name it contains multiple processes (ner on titles, querying ner output, gpt vision on cropped images, querying vision output). I added a folder in image_analysis/output_final called test_0324 that has the output from my last round of testing. The ner_wikilabel files have to do with titles and image_wikilabel files have to do with object recognition (performed on a subset of all the objects depicted in the works selected for ner_wikilabel). "Sample" was the result of me randomly getting 10 works from five categories (print, painting, poster, sculpture, ceramic) while "Warhol" was just a sample of Warhol works.

baskaufs commented 4 months ago

Emily's comments on final tests (2024-04-15):

As we discussed last time, it appears we need to be a bit more strategic about what tests we run on different types of artworks. I was able to figure out how to adjust URLs to trim images down to meet size requirements for ChatGPT, so I didn't have any more cases where it wasn't able to run an image (the processing time seemed to have improved too).

For object detection, I used the following procedure:

For NER on titles, I know we considered just running it on everything because of the relatively low cost, but as I took a second look at the titles I thought that there were some cases where the results would obviously not be good. I ended up just doing it on paintings:

As far as additional categories go, I think the ones most likely to give significant results are running paintings through vision (probably will have to do something like I did with prints where I filtered out certain styles) and running prints (and perhaps ceramics/sculptures?) through NER. This would allow us to try matching NER labels with detected objects, adding depicts statements in Wikidata, improving IIIF manifest labels, etc. I do think the quality of the output is much better this time around, especially when comparing the full-image vision output with that of Google Vision (though ChatGPT lacks bounding boxes).

Let me know what your thoughts are. All the files should be in image_analysis/output_final/test_0415.