Gene-Weaver / VoucherVision

Initiated by the University of Michigan Herbarium, VoucherVision harnesses the power of large language models (LLMs) to transform the transcription process of natural history specimen labels.
https://huggingface.co/spaces/phyloforfun/VoucherVision
GNU General Public License v3.0
18 stars 4 forks source link

Parsing issue with palm2-textunicorn-001 #21

Closed mickley closed 3 months ago

mickley commented 4 months ago

This model works fine, but its output is wrapped in JSON code blocks, see example below. Vouchervision fails to parse this.

```JSON
{
  "catalogNumber": "OSC-V-269654",
  "scientificName": "Echinodorus berteroi (Spreng.) Fassett",
  "genus": "Echinodorus",
  "specificEpithet": "berteroi",
  "infraspecificEpithet": "",
  "collector": "E. MacKinnon",
  "associatedCollectors": "B. Castro, L. Janeway, B. Humphrey, A. Groom",
  "collectorNumber": "113",
  "verbatimDate": "1/VIII/2017",
  "eventDate": "2017-08-01",
  "country": "United States",
  "stateProvince": "California",
  "county": "Tehama",
  "locality": "Channel of Elder Creek, 0.7 mi W (upstream) of the San Benito Bridge in Gerber, 1000ft upstream of low water crossing.",
  "decimalLatitude": "40.05200733333333",
  "decimalLongitude": "-122.16182200000001",
  "verbatimCoordinates": "Lat N 40° 3' 7.0266 Long W 122° 9' 42.5592 NAD83",
  "datum": "NAD83",
  "verbatimElevation": "238 ft",
  "cultivated": "",
  "habitat": "Growing on S side of stream in shade of Salix laevigata on silty soil of dried pool.",
  "plantDescription": "",
  "associatedSpecies": "Heliotropium europaeum, Xanthium strumarium, Mollugo verticillata, Artemisia douglasiana, Crypsis schoenoides."
}
Gene-Weaver commented 3 months ago

I can't get the LangChain retry parser to work for Palm2. I think it's losing long term support. So this fix just uses the simplest case to parse:

response_text = response.strip('```JSON\n').strip('\n```')

So if Palm-unicorn decides to return the literal dict, or just a string, then this will fail because it is not using the LangChain parser.