FunctionLab / ExPecto

predicting expression effects of human genome variants ab initio from sequence
121 stars 41 forks source link

How to load the model shown in this repo #30

Open houruiyan opened 1 year ago

houruiyan commented 1 year ago

Hello, I have a very stupid question. Does anyone can tell me how to open these saved model whose suffix are .dump, .save and .model.

Thank you very much!

jzthree commented 1 year ago

these are saved xgboost models and you can load models like these (.save and .dump are just two different formats, you don't need both)

modelList = pd.read_csv(args.modelList,sep='\t',header=0)
models = []
for file in modelList['ModelName']:
        bst = xgb.Booster({'nthread': args.threads})
        bst.load_model(file.strip())
        models.append(bst)
houruiyan commented 1 year ago

Thank you very much for your reply! Actually, I orignally think the model may be the epigenomic effect model. Now, I found it is not. I am very interested in the epigenomic effect model (e.g. CNN model). So, I still have some problem.

  1. I cannot understand why you use the conv2d in the chromatin.py https://github.com/FunctionLab/ExPecto/blob/master/chromatin.py#L55 However, in the selene repository, you use the conv1d https://github.com/FunctionLab/selene/blob/master/models/deepsea.py#L21

  2. I cannot understand why you use forward and reverse strand, Do you use all of these sequence during the training?

  3. Could you tell me the code of training epigenomic effect model? I am a newbie of pytorch. The description of nature method (Predicting effects of noncoding variants with deep learning–based sequence model) is negative log likelihood and also add some regulation terms. However when I check the selene, I found that it is the Binary Cross entropy. I am confused about that. Could you help me .

.

jzthree commented 1 year ago
  1. Conv2D is equivalent to Conv1D when used in this way
  2. Forward and reverse strand predictions are usually averaged to provide better predictions, in training we use both too.
  3. Yes I recommend checking out Selene. Negative log likelihood + L2 regularization is equivalent to using Binary Cross Entropy with weight decay in this case, just different phrases for the same thing.
houruiyan commented 1 year ago

Dear Dr Zhou,

Thank you very much for your reply.