Open Jasonxu0109 opened 2 years ago
For generating the equivalent of Xreducedall.2002.npy for new organisms, you will need to first train a sequence model to predict chromatin profiles in the organism of interest for you first, or use an existing model such as DeepArk (https://www.ncbi.nlm.nih.gov/pubmed/33888512/). It is important that Xreducedall.2002.npy is not computed from NarrowPeak files (as you mentioned in your other post), but is computed from sequence model predictions from sequences centered at major TSS for genes.
Once you have the sequence model predictions, codein this discussion should be helpful for generating the equivalent of Xreducedall.2002.npy https://github.com/FunctionLab/ExPecto/issues/9
Hi, Thank you for your reply. Importantly, we have sequenced our epigenetics data such as ATAC-seq. Therefore, We want to train our model rather than pre-compute model your provided! However, we can't find python script from the link your gave to preprocess our atac-seq data. Could you provide code and test file to reproduce the Xreducedall file? Thanks in advance!
If you want to train new sequence models for epigenetics data, feel free to check out https://github.com/FunctionLab/selene (there are tutorials and manuscript examples provided). Note that for ExPecto model there are two steps of training that are needed. First, you train the chromatin profiles sequence model, which will allow you to generate the equivalent Xreducedall.2002.npy, then you can modify the ExPecto script to do the second step of training for expression prediction.
Hi, Thank you so much. It's useful!!!
Hi, Could you provide the python script to produce Xreducedall.2002.npy file, pls? Maybe we can use the pipeline for another organisms analysis, such as FLY. Thansk in advance!!!
Best Wishes,