YichaoOU / easy_prime

prime editor gRNA design tool based on ensemble learning
MIT License
3 stars 2 forks source link

PE2 Jupyter notebook issues #2

Closed jamescrawf closed 3 years ago

jamescrawf commented 3 years ago

Hi Yichao

I was having a look at your repo and had some issues with PE_data_collection/PE2_model_training_evaluation_and_feature_importance.ipynb

  1. First, there is no "liyc_utils" available in this repo, so reproducing the workflow is not easily possible.
  2. My computer seems to get stuck at the line best_parameter_list = CV_tune_parameter(X_train,y_train,5) (takes forever even though processes are not using CPU at all). This could be connected to 1., since I'm not sure whether the packages I additionally imported are the ones you've run on your side as well:
import pandas as pd
from sklearn.model_selection import KFold
from xgboost import XGBRegressor
from sklearn.model_selection import ParameterGrid
from joblib import Parallel, delayed

Do you have a file where you stored your best_parameter_list or even the final xgboost model? This would solve my issues, since I could just import these and skipping the training part. :)

Thanks in advance & best wishes, James

YichaoOU commented 3 years ago

Hi James,

Sorry for the inconvenience. I have uploaded liyc_utils.py.

I don't have best_parameter_list. But the final models for PE2 and PE3 are here: https://github.com/YichaoOU/easy_prime/blob/master/model/

For example, the PE2 model is PE2_model_final.py. These are .pkl files, but I changed them to .py. So that they can be installed automatically by setup.py install or conda install

Feel free to ask any other questions.

Best, Yichao

jamescrawf commented 3 years ago

Thank you so much for your quick answer, really appreciate this! :) :muscle: Is the PE2_modle_final.py trained on the whole dataset (training & testing together) or only on the training dataset? The following part in the notebook suggests that it is on the whole dataset, correct?

X = df[f1]
y = df[Target]
final_model,_ = xgb_reg(best_parameter_list[0])
final_model.fit(X,y)

Best, James

YichaoOU commented 3 years ago

Yes

jamescrawf commented 3 years ago

Thanks :raised_hands: