FunctionLab / ExPecto

predicting expression effects of human genome variants ab initio from sequence
121 stars 41 forks source link

How to repeat file Xreducedall.2002.npy for another organisms #23

Open Jasonxu0109 opened 2 years ago

Jasonxu0109 commented 2 years ago

Hi, Could you provide the python script to produce Xreducedall.2002.npy file, pls? Maybe we can use the pipeline for another organisms analysis, such as FLY. Thansk in advance!!!

Best Wishes,

jzthree commented 2 years ago

For generating the equivalent of Xreducedall.2002.npy for new organisms, you will need to first train a sequence model to predict chromatin profiles in the organism of interest for you first, or use an existing model such as DeepArk (https://www.ncbi.nlm.nih.gov/pubmed/33888512/). It is important that Xreducedall.2002.npy is not computed from NarrowPeak files (as you mentioned in your other post), but is computed from sequence model predictions from sequences centered at major TSS for genes.

Once you have the sequence model predictions, codein this discussion should be helpful for generating the equivalent of Xreducedall.2002.npy https://github.com/FunctionLab/ExPecto/issues/9

Jasonxu0109 commented 2 years ago

Hi, Thank you for your reply. Importantly, we have sequenced our epigenetics data such as ATAC-seq. Therefore, We want to train our model rather than pre-compute model your provided! However, we can't find python script from the link your gave to preprocess our atac-seq data. Could you provide code and test file to reproduce the Xreducedall file? Thanks in advance!

jzthree commented 2 years ago

If you want to train new sequence models for epigenetics data, feel free to check out https://github.com/FunctionLab/selene (there are tutorials and manuscript examples provided). Note that for ExPecto model there are two steps of training that are needed. First, you train the chromatin profiles sequence model, which will allow you to generate the equivalent Xreducedall.2002.npy, then you can modify the ExPecto script to do the second step of training for expression prediction.

Jasonxu0109 commented 2 years ago

Hi, Thank you so much. It's useful!!!