aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
439 stars 181 forks source link

[results] the gmt output from the pyscenic ctx (0.10.3) is not a standard gmt format. #221

Closed Pentayouth closed 3 years ago

Pentayouth commented 4 years ago

I want to use pyscenic CLI for auc-matrix and regulon generation and to use R for downstream analysis. I know how to generate auc-matrix (just run 'pyscenic aucell') but I only have a vague idea about how to get the 'tf-target list' (regulons).

By reading issues: https://github.com/aertslab/pySCENIC/issues/126 and tutorials: https://rawcdn.githack.com/aertslab/SCENIC/0a4c96ed8d930edd8868f07428090f9dae264705/inst/doc/importing_pySCENIC.html I supposed that I should generate a 'gmt' file and use R to read it.

I used the following code:

docker run -it --rm -v $PWD:$PWD aertslab/pyscenic:0.10.3 \
pyscenic ctx \
$PWD/adj_20201006.tsv \
$PWD/resource/hg19-500bp-upstream-7species.mc9nr.feather \
$PWD/resource/hg19-tss-centered-10kb-7species.mc9nr.feather \
--annotations_fname $PWD/resource/motifs-v9-nr.hgnc-m0.001-o0.0.tbl \
--expression_mtx_fname $PWD/exprMat_filtered_for_GRNBoost2.csv \
--output $PWD/regulon_20201007.gmt \
--num_workers 16 \
--mask_dropouts

to generate the regulon gmt file the 1st field of the gmt file is tf name, and the 2nd field is 'tf=XXXX' , but 3rd filed is 'score=X.XXXX', which goes against the gmt file format definition at: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29 where it says that only the 2nd column could be used for comments and annotations

It caused some problems when using the R GSEABase::getGmt() functions. It couldn't read the file in properly.

is this a bug about the gmt file output or is there anything I've missed?

Thanks in advance for any comments.

cflerin commented 3 years ago

Thanks for spotting this, there's a fix in the dev branch.