Importing OCR and/or running OCR step separately. (Suggestion/Question)

Gene-Weaver / VoucherVision

Initiated by the University of Michigan Herbarium, VoucherVision harnesses the power of large language models (LLMs) to transform the transcription process of natural history specimen labels.

https://huggingface.co/spaces/phyloforfun/VoucherVision

GNU General Public License v3.0

18 stars 4 forks source link

Importing OCR and/or running OCR step separately. (Suggestion/Question) #3

Open norbo27 opened 1 year ago

norbo27 commented 1 year ago

Curious if OCR could be generated separately from the rest of the VoucherVision process, and if such OCR results could then later be pointed at for the second LLM step.

I would like to OCR whole bunches of data, and then later do the LLM step. Leaving this as a suggestion in case others would also find this useful.

Gene-Weaver commented 12 months ago

We will add functionality to optionally separate the OCR and LLM steps so that un-supported OCR can be used, such as OCR that might already be in the specimen's record.