Experiment with ChatGPT API

baskaufs commented 7 months ago

The Python quickstart is at https://platform.openai.com/docs/quickstart?context=python

baskaufs commented 6 months ago

New information about models (from a 2024-02-09 email):

Last week, we launched gpt-3.5-turbo-0125, our latest GPT-3.5 Turbo model. Along with various improvements, this model also has lower pricing: input prices for the new model are reduced by 50% to $0.0005 /1K tokens and output prices are reduced by 25% to $0.0015 /1K tokens.

If your code specifies gpt-3.5-turbo or gpt-3.5-turbo-16k (the pinned model alias), your model will be automatically updated to gpt-3.5-turbo-0125 on Friday, February 16 and you’ll receive the new, lower pricing.

If for any reason you'd like to continue using the old versions of GPT-3.5 Turbo, you can do so by updating your code to specify gpt-3.5-turbo-0613 or gpt-3.5-turbo-16k-0613 as the model parameter. Please note that the gpt-3.5-turbo-0613 and gpt-3.5-turbo-16k-0613 models will be shut down on June 13th, 2024 as part of our regular model upgrade cycles.

baskaufs commented 6 months ago

Chat GPT 4.0 Vision: https://platform.openai.com/docs/guides/vision

baskaufs commented 4 months ago

Comments from Daniel about prompts:

Once you've accepted the invitation to the slavesocieties GitHub org, you should be able to access the repo here.

The instruction sets that I've constructed are here, examples for NLP are here, and example text for HTR is here.

The functions that compile those instruction sets and examples for specific use cases are in this file. The function names related to each of these tasks are collect_instructions, generate_training_data, and generate_block_training_data, respectively. You can refer to the last of those functions to see the construction of the URLs for images corresponding to the HTR example text (and those URLs should be publicly accessible).

I know that what you're most interested in is how I use those instructions and examples to build a conversation history. You can see examples of this in any of these files. One other thing that you'll see there and that might be useful for you and Emily as well is constraining the model to produce JSON output by specifying a response format in the API call (note that in order for this to work you also need to explicitly refer to "JSON" in the first message in the conversation).

baskaufs commented 4 months ago

From Emily:

The most recent version of the code should still be in the openai_ner.ipynb file (in label_analysis/chat_gpt); despite the file name it contains multiple processes (ner on titles, querying ner output, gpt vision on cropped images, querying vision output). I added a folder in image_analysis/output_final called test_0324 that has the output from my last round of testing. The ner_wikilabel files have to do with titles and image_wikilabel files have to do with object recognition (performed on a subset of all the objects depicted in the works selected for ner_wikilabel). "Sample" was the result of me randomly getting 10 works from five categories (print, painting, poster, sculpture, ceramic) while "Warhol" was just a sample of Warhol works.

baskaufs commented 4 months ago

Emily's comments on final tests (2024-04-15):

As we discussed last time, it appears we need to be a bit more strategic about what tests we run on different types of artworks. I was able to figure out how to adjust URLs to trim images down to meet size requirements for ChatGPT, so I didn't have any more cases where it wasn't able to run an image (the processing time seemed to have improved too).

For object detection, I used the following procedure:

Use property coverage from Wikidata to get specific subsets: ceramics (soba bowls, plates, dishes), sculptures, drawings (creation date before 1940 or accession number below 1996, as anything outside of these filters was majority abstract/nonrepresentational art); results in work_type.csv
Use object_localization_image_urls and accession_dimensions to extract IIIF url for full image, then shrink down so max dimension was capped at 512 pixels; results in work_type_gpt.csv
Use modified urls as input for GPT vision, with slightly modified messages parameter for each work type and asking for all output to be in JSON. I also asked ChatGPT to return everything it thought was a notable subject in the work (instead of forcing it to identify a main subject and limiting it to three guesses); results in work_type_gpt_output1.csv

For NER on titles, I know we considered just running it on everything because of the relatively low cost, but as I took a second look at the titles I thought that there were some cases where the results would obviously not be good. I ended up just doing it on paintings:

Use property coverage from Wikidata to get all paintings, filtered to ones whose titles were not "Untitled"; results in painting_ner.csv
Run NER on titles in batches of 15 (to prevent output from being cut off by the token limit). I had better luck getting the output in JSON format this time but every 200 works or so the NER function had to be re-run (otherwise some random character would pop up in the output and my json.loads would throw an error); results in painting_ner1.csv

As far as additional categories go, I think the ones most likely to give significant results are running paintings through vision (probably will have to do something like I did with prints where I filtered out certain styles) and running prints (and perhaps ceramics/sculptures?) through NER. This would allow us to try matching NER labels with detected objects, adding depicts statements in Wikidata, improving IIIF manifest labels, etc. I do think the quality of the output is much better this time around, especially when comparing the full-image vision output with that of Google Vision (though ChatGPT lacks bounding boxes).

Let me know what your thoughts are. All the files should be in image_analysis/output_final/test_0415.

HeardLibrary / vandycite

Experiment with ChatGPT API #114