elastic / eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
https://eland.readthedocs.io
Apache License 2.0
627 stars 98 forks source link

support adding cert path to eland_import_hub_model script #663

Closed vkichu closed 2 months ago

vkichu commented 4 months ago

while Using the [eland_import_hub_model] script eland version 8.12.1 elastic client installed in docker locally Elastic Search 8.12

while running the elasticsearch_labs Integration notebooks for huggin-face import model using ( https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/integrations/hugging-face/loading-model-from-hugging-face.ipynb) , I am running the same in my local notebook environment

!eland_import_hub_model --url https://localhost:9200 -u elastic -p --hub-model-id sentence-transformers/all-MiniLM-L6-v2 --task-type text_embedding --insecure --start --clear-previous

The above command will not execute without --insecure flag. I go the model to successfully load. But getting the following warning InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings

I have been using the ElasticSearch method to create a new esclient which accepts ca_certs as the parameter and able to provide the path to the local certificate and create a es client successfully example

curl -XGET https://127.0.0.1:9200 -u elastic:mypassword --cacert /etc/elasticsearch/certs/ca.crt

Are you able to add an option to include -cacert or use an existing eslclient object and pass it to the eland_import_hub_model similar to the eland.DataFrame implementation ?

pquentin commented 4 months ago

Hello! And thanks for your feature request. That said, the eland_import_hub_model already supports a --ca-certs option which will pass ca_certs to the Elasticsearch client. This option is documented in https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#_verifying_https_with_ca_certificates but not in Eland.

Are you interested in documenting it for Eland in https://github.com/elastic/eland/blob/main/docs/guide/machine-learning.asciidoc yourself?

ppf2 commented 4 months ago

Our documentation should probably add a "Configuration" section that covers all the available parameters along with short descriptions (can be derived from the help text) of their usage. The missing documentation on ca_certs is just an example. The related --insecure (for non-production use) to disable cert verification is another one.

pquentin commented 4 months ago

Thanks, I opened https://github.com/elastic/eland/pull/667 to fix this issue. And sorry to have asked @vkichu to do it. I often get users asking me about contributing to the clients, so I tend to be on the lookout for such opportunities, since I cannot do everything myself. But that does not make my earlier ask right.

ppf2 commented 4 months ago

Thank you! ❤️