FrankWanger / ML_Peptide

Deployed Model for Article: Machine learning predicts peptide stability in simulated gastrointestinal fluids
2 stars 3 forks source link

Problem running model #2

Closed ackbar03 closed 2 weeks ago

ackbar03 commented 2 weeks ago

Hi,

I am trying to run the code but I keep getting this error:

  File "/home/ackbar03/proteins/GIStability/ML_Peptide/predict.py", line 10, in <module>
    SIF_Stability = model_predict(feat=peptide_features,Env='SIF')
  File "/home/ackbar03/proteins/GIStability/ML_Peptide/lib/pred_util.py", line 30, in model_predict
    pred_Env = model['GI_encoder'].transform(np.array(df_pred['Env']).reshape(-1, 1))
  File "/home/ackbar03/miniconda3/envs/dbnd/lib/python3.9/site-packages/sklearn/utils/_set_output.py", line 313, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/home/ackbar03/miniconda3/envs/dbnd/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 1582, in transform
    ignore_category_indices=self._missing_indices,
AttributeError: 'OrdinalEncoder' object has no attribute '_missing_indices'

The error comes from trying to use the loaded model to perform operations:

pred_Env = model['GI_encoder'].transform(np.array(df_pred['Env']).reshape(-1, 1))

In the model_predict function:

def model_predict(feat,Env = 'SIF'):
    SIF_FEATURE_LIST = ['MinAbsEStateIndex','qed','MinPartialCharge','Chi1v','PEOE_VSA8','SMR_VSA10','SMR_VSA4','SMR_VSA6','SlogP_VSA3','EState_VSA10','EState_VSA2','EState_VSA6','EState_VSA8','EState_VSA9','VSA_EState1','VSA_EState4','VSA_EState8']
    SGF_FEATURE_LIST = ['ExactMolWt','NumHAcceptors','NumHDonors','MolLogP','TPSA','NumRotatableBonds']

    if Env == 'SIF':
        print('Prediting SIF Stability...')
        feat = feat[SIF_FEATURE_LIST]
        df_pred=feat.assign(Env='Intestinal')
        model = pickle.load(open('model/SIF_model', 'rb'))
    elif Env == 'SGF':
        feat = feat[SGF_FEATURE_LIST]
        print('Prediting SGF Stability...')
        df_pred=feat.assign(Env='Gastric')
        model = pickle.load(open('model/SGF_model', 'rb'))
    else:
        raise KeyError('Wrong Env Set, should be either SIF or SGF')

    pred_Env = model['GI_encoder'].transform(np.array(df_pred['Env']).reshape(-1, 1))
    pred_features=feat

    #PCA
    pred_features = model['feature_scaler'].transform(np.array(pred_features))
    pred_Features = np.concatenate([pred_Env,pred_features],axis=1)

    y_pred = model['clf'].predict(pred_Features)
    y_pred = model['Label_encoder'].inverse_transform(y_pred.reshape(-1, 1))
    return (y_pred)

Do you have any advice on how to solve this?

FrankWanger commented 2 weeks ago

Hi there,

Thanks for your interest in this project. May I know your current version of python, sklearn and py-xgboost? I have tested and got the code runs fine with the specified dependencies' versions in readme file. I did try running the code with python 3.9.20 and sklearn (1.2.0/1.5.1) and some problems will emerge.

Unfortunately I've left the company (where I published this work) so now I do not have access to the training code. Otherwise I would be able to re-export the pre-trained model. But at least reverting to python 3.7.7 and sklearn 0.24 will help.

I tested newest version of rdkit (2024.09.1), it worked as well.

Let me know how it goes and I can close the problem if you made it working.

Best, Fanjin

ackbar03 commented 2 weeks ago

Hi,

Indeed, it seems to be a versioning issue. Reverting to python 3.7.7 and sklearn 0.24 solved the problem.

Thanks so much for your prompt reply!