R1j1t / contextualSpellCheck

✔️Contextual word checker for better suggestions
MIT License
405 stars 56 forks source link

add proxy setting when initialize "contextual spellchecker" #68

Closed Xiaoping777 closed 2 years ago

Xiaoping777 commented 2 years ago

Is your feature request related to a problem? Please describe. our work is behind a firewall, need a proxy setting to fetch the transformer model "bert-base-cased", I tried to download via "AutoTokenizer.from_pretrained('bert-base-cased', proxies={'http':PROXY, 'https':PROXY})"; however, looks like it cannot find the path. so I have to change 107 line by adding the proxy: self.BertTokenizer = AutoTokenizer.from_pretrained(self.model_name, proxies={'http':PROXY, 'https':PROXY})

Describe the solution you'd like I want the package can accept the proxy setting when initialize the class

Describe alternatives you've considered or I can download via transformer from_pretrained function, while "contextual spellchecker" can find it properly

Additional context Add any other context or screenshots about the feature request here.

R1j1t commented 2 years ago

This I think is a good enhancement to have, I will have a look and try to commit the same. If you think, you can contribute please feel free to open a PR. I would be more than happy to merge!

Edit: I relooked at the code and it current allows passing the transformer models from your local. Below is an example:

>>> from os import listdir
>>> import spacy
>>> import contextualSpellCheck
>>> nlp = spacy.load("en_core_web_sm") 
>>> nlp.add_pipe("contextual spellchecker", config={'model_name':'<COMPLETE_PATH_TO_TRANFORMER_MODEL_FOLDER>/transform_model/'})
<contextualSpellCheck.contextualSpellCheck.ContextualSpellCheck object at 0x7fe6fb8451f0>
>>> doc = nlp("Income was $9.4 milion compared to the prior year of $2.7 milion.")
>>> print(doc._.performed_spellCheck)
True
>>> print(doc._.outcome_spellCheck)
Income was $9.4 million compared to the prior year of $2.7 million.
>>> listdir()
['tokenizer_config.json', 'config.json', 'tokenizer.json', 'vocab.txt', 'pytorch_model.bin']
>>> 

This will act as a workaround for the proxy issue, but I still think, I will update the code to pass the proxy. Might need to pass **kwargs to tokenizer and model.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.