Unstructured-IO / unstructured-api

Apache License 2.0
446 stars 101 forks source link

refactor: fix `ocr_languages` parameter type #375

Closed christinestraub closed 4 months ago

christinestraub commented 5 months ago

This PR adds support for both list[str] and str input formats for ocr_languages parameter (e.g. ["eng", "deu"] or "eng+deu")

Testing

CI should pass.

christinestraub commented 5 months ago

This a breaking change. is it possible to support either string or List for this param?

@cragwolfe Currently, we don't have any parameters in the API that support more than one type, so I'm not sure if this would be ideal. What do you think? @awalker4

In my opinion, users could use the languages param instead of ocr_languages for the List value and we have enough documentation for both ocr_languages and languages params.

awalker4 commented 4 months ago

My understanding was that the new value_or_first_element parsers handle this, is that not the case? As in, if there's one element, we can try to break it up by looking for plus signs, otherwise we join the list items into a string. Since this is deprecated, any custom logic to support both inputs shouldn't have to stick around for too long.

awalker4 commented 4 months ago

Looks great, thanks!