Initiated by the University of Michigan Herbarium, VoucherVision harnesses the power of large language models (LLMs) to transform the transcription process of natural history specimen labels.
Curious if OCR could be generated separately from the rest of the VoucherVision process, and if such OCR results could then later be pointed at for the second LLM step.
I would like to OCR whole bunches of data, and then later do the LLM step. Leaving this as a suggestion in case others would also find this useful.
We will add functionality to optionally separate the OCR and LLM steps so that un-supported OCR can be used, such as OCR that might already be in the specimen's record.
Curious if OCR could be generated separately from the rest of the VoucherVision process, and if such OCR results could then later be pointed at for the second LLM step.
I would like to OCR whole bunches of data, and then later do the LLM step. Leaving this as a suggestion in case others would also find this useful.