exasol / transformers-extension

An Exasol extension for using state-of-the-art pretrained machine learning models via the Hugging Face Transformers API.
MIT License
2 stars 2 forks source link

Improve documentation with missing info and better readability #133

Closed MarleneKress79789 closed 5 months ago

MarleneKress79789 commented 11 months ago
### Tasks
- [x] overhaul documentation for better readability
- [x] do we want to use pyexasol for the connection in the user guide? looks cleaner imo
- [x] also info about adding SSL certificate?
- [x] mention each model is 2 part model and transformer (in dev guide at least)
- [ ] in dev docu: add info about change of model if parameters changed(happens in base_model_udf), and how
- [ ] in user docu: mention udf can change udf/model if the input set changes the model parameters
- [ ] add example usages in user docu (will be done in #164)
- [x] mention trigger slc dowload test after release to make sure it works with new version. test_language_container_deployer_cli_by_downloading_container (maybe trigger automatically in nox session which does a release)
- [x] add information about the certificates as explained [here](https://github.com/exasol/transformers-extension/pull/150/files/99549de939e855f1698a05a4145d3373f51ffab9#r1392322048 )
- [x] add explanation about model download and uplaod, details [here](https://github.com/exasol/transformers-extension/pull/150#discussion_r1392443311)
- [x] mention after release check .dist and .dist in language container for multiple wheels
- [x] mention windows not supported
- [x] change description of project to mention Huggingface
- [x] see comments in https://github.com/exasol/transformers-extension/pull/150
- [x] link to udf docu
- [x] link to bucketfs docu
- [ ] add screenshot from exa operations where to find parameters like bucketfs name (see screenshot) #164
- [x] mention pypi release and pip install
- [x] get correct link  to pypi and install comand
ahsimb commented 11 months ago

Please add some explanation for the Model Uploader Script. In particular we need to mention that the --local-model-path needs to point to a cache directory. For example, one may download a model using this code:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained(<MODEL-NAME>, cache_dir="whatever")
model = AutoModelForSequenceClassification.from_pretrained(<MODEL-NAME>,, cache_dir="whatever")

Then it can be uploaded to the BucketFS with this command:

python -m exasol_transformers_extension.upload_model \
   ...
    --local-model-path whatever     
ahsimb commented 11 months ago

Maybe it's worth mentioning that to use TE_MODEL_DOWNLOADER_UDF the DB must have access to the internet. Currently this is not the case with the Docker DB by default. One needs to specify a name server. For example --nameserver 8.8.8.8 will set it to use Google DNS.

ahsimb commented 11 months ago

Currently there is a bag in the bucketfs-python, which forces a user to have at least one level of sub-directories in the bucketfs root. Consequently, the --path-in-bucket parameter cannot be empty.