Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
9.21k stars 764 forks source link

remove abstract decorator from initialize in BaseEmbeddingEncoder #3730

Open mattseddon opened 1 month ago

mattseddon commented 1 month ago

Closes #3731. Please see the issue for details.

Comments inline.

mattseddon commented 2 weeks ago

@MKhalusova, can you help me get this reviewed?... please 🙏🏻🙏🏻🙏🏻. The DataChain unstructured examples (here and here) are currently stuck on an older version of Unstructured because of #3731 and this change should close that issue.

mattseddon commented 2 weeks ago

@cragwolfe would this 👇🏻 be the correct format for the changelog entry?

0.16.5-dev0

Enhancements

Features

Fixes

MKhalusova commented 2 weeks ago

@scanny Cam you please take a look?

scanny commented 2 weeks ago

@rbiseck3 Hi Roman, can you take a look at this one-liner? I think you were closer to the choices on this one. I'm not sure whether this is the right fix or whether HuggingFaceEmbeddingEncoder needs to implement a trivial .initialize() method.