Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
9.21k stars 764 forks source link

Telemetry request has no configured timeout #3791

Open Rojuinex opened 13 hours ago

Rojuinex commented 13 hours ago

The requests in scarf_analytics have no timeout set, which can cause an application using the unstructured client to hang.

https://github.com/Unstructured-IO/unstructured/blob/3b9b01c502cf5f319fd8bb2427232af96af5c637/unstructured/utils.py#L281C1-L309C18

Right now the only work around is to set SCARF_NO_ANALYTICS or DO_NOT_TRACK, but that opts out of analytics completely. A better approach is to set the timeout parameter to a reasonable value to avoid deadlocking the application.