Closed KyleeValencia closed 2 years ago
Hi @KyleeValencia. Basically on initialization you set mapping_path
to a JSON file (it doesn't need to exist, just like when you use to_csv()
in pandas for example). For your use case set pretrained=False
, EE will run the encoding process as normal and save the encoding lookup table to mapping_path
.
If pretrained=True
, EE will not train anything, look for a JSON where you specified and load that for transformations; if there's nothing, it'll throw an error. This is what you do after you've trained at least once and saved the JSON.
@rxavier I tried to transform the data from pretrained EE and it give me error like this
And this is my code
# List of categorical_column_name that I need to embed
cat_cols = list(X_train[(X_train.dtypes=='object').index].columns)
#Embedder Initialization
Embedding_Categorical = EmbeddingEncoder(task='classification',
keep_model=True,
mapping_path="./Embed_TF_Mushromm_Categorical_Data.json")
# Fitting EE
Embedding_Categorical.fit(X_train[cat_cols],Y_train)
# Test to transform
testTransform = Embedding_Categorical.transform(X_train[cat_cols])
# Test to load model from json file
testLoad = EmbeddingEncoder(task = 'classification',
pretrained = True,
mapping_path='./Embed_TF_Mushromm_Categorical_Data.json')
# Test to transform data from loaded EE model
testTransform_tf = testLoad.transform(X_train[cat_cols])
You need to fit()
first. This is because scikit-learn always tries to fit, so it needs to be called or it wouldn't work in Pipelines for example.
@rxavier It works ! Thank you for the guidance 👍
Hello I want to ask how to save the model in my disk and after that I want to use the model on other time by load the model. Is it using mapping_to_json as the function or I need to save it like any other keras model saving method where the model are specified in ee._model ? And what about if I want to reload it ?
Some beginner like me having a hard time to understand the documentation. And I hope save and reload model instruction and code example can be added to documentation since its pretty crucial for some user like me.
Thank you and keep the good work :>