dpeerlab / SEACells

SEACells algorithm for Inference of transcriptional and epigenomic cellular states from single-cell genomics data
GNU General Public License v2.0
142 stars 26 forks source link

PWM.h5ad in SEACells/notebooks/SEACell_tf_activity.ipynb #42

Closed roshan9128 closed 1 year ago

roshan9128 commented 1 year ago

How was the pwm.h5ad file in the SEAcell_tf_activity.ipynb file constructed:

image

christinedien commented 1 year ago

Hello,

Thank you for using SEACells! The PWM in the TF activity inference notebook was constructed using FIMO from The Meme Suite. FIMO was run with default parameters using the cisBP human v2 motif set, which can be downloaded here.

Sequences for the "sequence file" parameter were retrieved with SeqGL's get_seqs() function using hg38 and peaks from the ATAC-seq data. get_seqs() takes a .bed file containing the peaks from the ATAC-seq data. There are other tools that can be used to retrieve sequences from the peaks for motif identification, this is just the tool we used (:

The fimo.tsv output file is then parsed and saved as an AnnData object with peaks as observations and identified motifs as variables.

We acknowledge that this "PWM" is not in the traditional format of a PWM. We will be changing the name soon to reflect this and improve clarity.

Hope this helps!

roshan9128 commented 1 year ago

Thanks for the response! Closing this ticket.