Closed Goultard59 closed 2 years ago
hi @Goultard59, the motif collection used for the pre-calculated cisTarget data bases includes several publicly available data sets, e.g. from JASPAR (http://jaspar.genereg.net/downloads/), fly factor survey (https://pgfe.umassmed.edu/ffs/), homer (http://homer.ucsd.edu/homer/motif/HomerMotifDB/homerResults.html), ... (check the 'source_name' column in the motif annotation tables https://resources.aertslab.org/cistarget/motif2tf)
for scoring with cbust, the motifs need to be brought into a specific format described here: https://orca.bu.edu/page/ClusterBuster_download
The motif file should contain matrices in the following format:
>element1
0 4 2 14
12 0 0 8
8 0 1 11
20 0 0 0
>element2
13 1 1 5
The rows of each matrix correspond to successive positions of the motif, from 5' to 3', and the columns indicate the frequencies of A, C, G, and T, respectively, in each position. These frequencies are usually obtained from alignments of protein-binding sites."
i'll add this information to the documentation.
the assignment of a motif to a specific TF can be direct (e.g. when a motif has been characterized by ChIP-Seq), or indirect (inferred by motif or TF homology). The Motif2TF procedure is described by Janky et al. (https://doi.org/10.1371/journal.pcbi.1003731).
if you are working on an organism other than human, mouse or drosophila, with little knowledge about TF binding specificities, you can feed SCENIC a custom Motif2TF data base. therefore, i'd check for orthologous TFs and substitute the human/mouse/fly gene name in the Motif2TF table (https://resources.aertslab.org/cistarget/motif2tf/) by the gene names of your species of interest.
BioPython Bio.motifs
module contains support for reading quite a lot of motif formats and writing them to Cluster-Buster format.
@tropfenameimer For creating databases, each motif should be in a different motif file as Cluster-Buster will not score each motif independently when you put them in the same file.
Hi @tropfenameimer, I need to create the mouse database with an additional motif. In order to run create_cisTarget_databases the procedure say that you need the motif collection and the FASTA file. I already have the motif collection including the new motif (in cb format) but I don't understand where to get the right FASTA file. My question is which FASTA file should I use it in order to run create_cisTarget_databases? In the gencode website (https://www.gencodegenes.org/mouse/release_M25.html) there are several files and the "Genome sequence (GRCm38.p6)" file doesn't have gene annotation (it contain just the chromosome annotation). Could you provide me the FASTA file you used in order to run it in the same reference?
Another question is about how to add this "direct motif" into the TF annotation file (in my case motifs-v9-nr.mgi-m0.001-o0.0.tbl).
Thank you in advance.
Hi and thanks for your Answers,
I used orthofinder for gene orthologue between human and pigs (unfortunately only 60% of my TF overlap the human motif2tf annotation file). But I'm getting an error while running nextflow run aertslab/SCENICprotocol
: KeyError: 'MotifSimilarityQvalue'
Thanks for the helps.
Hi @Goultard59 ,
Can you try renaming that column in your annotation file? pySCENIC in particular is looking for specific column names (sorry, it's a current limitation). It should look be motif_similarity_qvalue
something like:
#motif_id gene_name motif_similarity_qvalue orthologous_identity description
ENCFF767XQV CTCF 0.000 1.000 CTCF (vagina female adult (53 years))
ENCFF677KBX ZNF584 0.000 1.000 eGFP-ZNF584 (K562 genetically modified using CRISPR)
ENCFF727RFJ ZNF584 0.000 1.000 eGFP-ZNF584 (K562 genetically modified using CRISPR)
ENCFF986ATH ZNF584 0.000 1.000 eGFP-ZNF584 (K562 genetically modified using CRISPR)
ENCFF662UNB JUN 0.000 1.000 JUN (MCF-7)
ENCFF473URL JUN 0.000 1.000 JUN (MCF-7)
ENCFF370DGU JUN 0.000 1.000 JUN (MCF-7)
ENCFF528ENR CTCF 0.000 1.000 CTCF (tibial artery male adult (37 years))
ENCFF380FCK IRF3 0.000 1.000 IRF3 (GM12878)
Hello,
Here is the head of the file :
#motif_id motif_name motif_description source_name source_version motif_similarity_qvalue similar_motif_id similar_motif_description orthologous_identity orthologous_gene_name orthologous_species description gene_name
bergman__Abd-B Abd-B Abd-B bergman 1.1 6e-04 cisbp__M1008 HOXA6[gene ID: "ENSG00000106006" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; HOXB9[gene ID: "ENSG00000170689" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; HOXC9[gene ID: "ENSG00000180806" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; Hoxa9[gene ID: "ENSMUSG00000038227" species: "Mus musculus" TF status: "direct" TF family: "Homeodomain" DBDs: "Homeobox"]; Hoxb9[gene ID: "ENSMUSG00000020875" species: "Mus musculus" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; NP_032296.2[gene ID: "NP_032296.2" species: "Mus musculus" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"] 0.981618 ENSMUSG00000038227 M. musculus gene is orthologous to ENSMUSG00000038227 in M. musculus (identity = 98%) which is annotated for similar motif cisbp__M1008 ('HOXA6[gene ID: "ENSG00000106006" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; HOXB9[gene ID: "ENSG00000170689" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; HOXC9[gene ID: "ENSG00000180806" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; Hoxa9[gene ID: "ENSMUSG00000038227" species: "Mus musculus" TF status: "direct" TF family: "Homeodomain" DBDs: "Homeobox"]; Hoxb9[gene ID: "ENSMUSG00000020875" species: "Mus musculus" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; NP_032296.2[gene ID: "NP_032296.2" species: "Mus musculus" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]'; q-value = 0.0006) ENSSSCG00000028997
bergman__Aef1 Aef1 Aef1 bergman 1.1 0 None None 0.213656 FBgn0005694 D. melanogaster motif is annotated for orthologous gene FBgn0005694 in D. melanogaster (identity = 21%) ZNF8
bergman__Cf2 Cf2 Cf2 bergman 1.1 0 None None 0.15098 FBgn0000286 D. melanogaster motif is annotated for orthologous gene FBgn0000286 in D. melanogaster (identity = 15%) ZNF853
bergman__EcR_usp EcR_usp EcR/usp bergman 1.1 0 None None 0.378261 FBgn0000546 D. melanogaster gene is orthologous to FBgn0000546 in D. melanogaster (identity = 37%) which is directly annotated for motif NR1H2
Hi @Goultard59 , I see you already have the proper column name, in this case it's a different issue with pySCENIC than I first thought. I will have to look further at this. Can you also include the full command that you ran, and the error message that you got?
Hello @cflerin,
I do a little bite of cleaning in the motif annotation file (removing the motif which are not included in my feather files, removing motif similarity) but I still get the same issue.
I'm using the VSN nextflow pipeline with the following command
module load bioinfo/Nextflow-v20.11.0-edge
module load system/singularity-3.5.3
nextflow -C /home/adufour/work/scenic_pig/pigs.vsn-pipelines.complete.config run vib-singlecell-nf/vsn-pipelines -entry scenic`
N E X T F L O W ~ version 20.11.0-edge
Launching `vib-singlecell-nf/vsn-pipelines` [cranky_leavitt] - revision: 721c42f889 [master]
executor > local (3)
[fc/1cf8f4] process > scenic:SCENIC:ARBORETO_WITH_MULTIPROCESSING (1) [100%] 1 of 1 ✔
[e9/f55d91] process > scenic:SCENIC:ADD_PEARSON_CORRELATION (1) [100%] 1 of 1 ✔
[bf/fe6141] process > scenic:SCENIC:CISTARGET__MOTIF (1) [ 0%] 0 of 1
[- ] process > scenic:SCENIC:AUCELL__MOTIF -
[- ] process > scenic:SCENIC:VISUALIZE -
[- ] process > scenic:SCENIC:PUBLISH_LOOM -
[- ] process > scenic:PUBLISH_SCENIC:COMPRESS_HDF5 -
[- ] process > scenic:PUBLISH_SCENIC:SC__PUBLISH -
Error executing process > 'scenic:SCENIC:CISTARGET__MOTIF (1)'
Caused by:
Process `scenic:SCENIC:CISTARGET__MOTIF (1)` terminated with an error exit status (1)
Command executed:
export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1
pyscenic ctx test_pigs__adj.tsv Sus_scrofa.feather.motifs_vs_regions.scores.feather Sus_scrofa.feather.regions_vs_motifs.scores.feather Sus_scrofa.feather.regions_vs_motifs.rankings.feather Sus_scrofa.feather.motifs_vs_regions.rankings.feather --annotations_fname motif2tf_orthologuous.tbl --expression_mtx_fname human.loom --cell_id_attribute CellID --gene_attribute Gene --mode "dask_multiprocessing" --output test_pigs__reg_mtf.csv.gz --num_workers 4
Command exit status:
1
Command output:
(empty)
Command error:
/opt/venv/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
data = yaml.load(f.read()) or {}
2021-06-01 14:03:29,162 - pyscenic.cli.pyscenic - INFO - Creating modules.
executor > local (3)
[fc/1cf8f4] process > scenic:SCENIC:ARBORETO_WITH_MULTIPROCESSING (1) [100%] 1 of 1 ✔
[e9/f55d91] process > scenic:SCENIC:ADD_PEARSON_CORRELATION (1) [100%] 1 of 1 ✔
[bf/fe6141] process > scenic:SCENIC:CISTARGET__MOTIF (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > scenic:SCENIC:AUCELL__MOTIF -
[- ] process > scenic:SCENIC:VISUALIZE -
[- ] process > scenic:SCENIC:PUBLISH_LOOM -
[- ] process > scenic:PUBLISH_SCENIC:COMPRESS_HDF5 -
[- ] process > scenic:PUBLISH_SCENIC:SC__PUBLISH -
Error executing process > 'scenic:SCENIC:CISTARGET__MOTIF (1)'
Caused by:
Process `scenic:SCENIC:CISTARGET__MOTIF (1)` terminated with an error exit status (1)
Command executed:
export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1
pyscenic ctx test_pigs__adj.tsv Sus_scrofa.feather.motifs_vs_regions.scores.feather Sus_scrofa.feather.regions_vs_motifs.scores.feather Sus_scrofa.feather.regions_vs_motifs.rankings.feather Sus_scrofa.feather.motifs_vs_regions.rankings.feather --annotations_fname motif2tf_orthologuous.tbl --expression_mtx_fname human.loom --cell_id_attribute CellID --gene_attribute Gene --mode "dask_multiprocessing" --output test_pigs__reg_mtf.csv.gz --num_workers 4
Command exit status:
1
Command output:
(empty)
Command error:
/opt/venv/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
data = yaml.load(f.read()) or {}
2021-06-01 14:03:29,162 - pyscenic.cli.pyscenic - INFO - Creating modules.
2021-06-01 14:03:31,561 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.
2021-06-01 14:03:31,798 - pyscenic.utils - INFO - Using existing Pearson correlations from the adjacencies file.
2021-06-01 14:03:31,965 - pyscenic.utils - INFO - Creating modules.
2021-06-01 14:06:54,593 - pyscenic.cli.pyscenic - INFO - Loading databases.
2021-06-01 14:06:54,593 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
Traceback (most recent call last):
File "/opt/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'MotifSimilarityQvalue'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/venv/bin/pyscenic", line 8, in <module>
sys.exit(main())
File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 470, in main
args.func(args)
File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 194, in prune_targets_command
num_workers=args.num_workers)
File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 351, in prune2df
num_workers, module_chunksize)
File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 221, in _distributed_calc
orthologous_identity_threshold=orthologuous_identity_threshold)
File "/opt/venv/lib/python3.7/site-packages/pyscenic/utils.py", line 51, in load_motif_annotations
df = df[(df[COLUMN_NAME_MOTIF_SIMILARITY_QVALUE] <= motif_similarity_fdr) &
File "/opt/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2995, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'MotifSimilarityQvalue'
Work dir:
/work/adufour/scenic_pig/work/bf/fe6141c697d5e74b9cc264e821d159
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out
This is the motif annotation file i used motif2tf.xlsx
And here is the head of a feather output files
c2h2zf-M0369 | c2h2zf-M0373 | c2h2zf-M0385 | c2h2zf-M0393 | c2h2zf-M0400 | c2h2zf-M0401 | c2h2zf-M0404 | c2h2zf-M0406 | c2h2zf-M0415 | c2h2zf-M0462 | ⋯ | yetfasco-798 | yetfasco-8 | yetfasco-815 | yetfasco-830 | yetfasco-831 | yetfasco-864 | yetfasco-870 | yetfasco-879 | yetfasco-962 | regions |
---|
18053 | 1300 | 12540 | 8606 | 17912 | 5161 | 6874 | 14082 | 656 | 18141 | ⋯ | 18894 | 7464 | 12420 | 1151 | 16874 | 7898 | 2785 | 3525 | 2062 | A1CF 18363 | 8154 | 2009 | 9263 | 17882 | 2304 | 6617 | 8956 | 18363 | 2949 | ⋯ | 11566 | 13700 | 13655 | 14519 | 12606 | 958 | 5562 | 12328 | 9300 | A2ML1 897 | 3947 | 20218 | 2830 | 14819 | 1962 | 6344 | 9476 | 1618 | 13230 | ⋯ | 7577 | 17988 | 7592 | 14544 | 12894 | 2366 | 4094 | 400 | 1174 | A3GALT2 3394 | 14057 | 14969 | 4354 | 10218 | 5015 | 20044 | 4652 | 6266 | 5063 | ⋯ | 5444 | 11594 | 7530 | 18590 | 14391 | 7067 | 6691 | 11897 | 13953 | A4GALT 8291 | 3475 | 11018 | 2995 | 14180 | 1563 | 1435 | 19162 | 15015 | 19187 | ⋯ | 13802 | 20057 | 5589 | 11866 | 5672 | 1039 | 11391 | 7269 | 12654 | A4GNT 19337 | 2392 | 19853 | 11336 | 9885 | 12297 | 1993 | 18413 | 7645 | 20437 | ⋯ | 11580 | 14298 | 20610 | 10711 | 4766 | 19464 | 15643 | 18952 | 4423 | AAAS
@Goultard59 Can you run:
head motif2tf_orthologuous.tbl
file motif2tf_orthologuous.tbl
on your motif annotation file? It has to be a TSV (TAB separated) file. Also make sure that Excell is not corrupting your file (e.g. changing genenames to dates and stuff). If you want to use a graphical editor, it might be better to use LibreOffice, which provides an option "Save as... ==> CSV ==> Edit filter settings".
Thanks for your responses, the error was caused by the columns order.
Unfortunately, I'm still blocked by the same error as #6 My config file is the following : https://pastebin.com/PSW2wwuU
Thanks for your help.
Launching `vib-singlecell-nf/vsn-pipelines` [sleepy_jepsen] - revision: 721c42f889 [master]
[- ] process > scenic:SCENIC:ARBORETO_WITH... -
[- ] process > scenic:SCENIC:ARBORETO_WITH... -
[- ] process > scenic:SCENIC:ADD_PEARSON_C... -
[- ] process > scenic:SCENIC:CISTARGET__MOTIF -
[- ] process > scenic:SCENIC:AUCELL__MOTIF -
[- ] process > scenic:SCENIC:ARBORETO_WITH... -
[- ] process > scenic:SCENIC:ADD_PEARSON_C... -
[- ] process > scenic:SCENIC:CISTARGET__MOTIF -
[- ] process > scenic:SCENIC:AUCELL__MOTIF -
[- ] process > scenic:SCENIC:VISUALIZE -
[- ] process > scenic:SCENIC:PUBLISH_LOOM -
[- ] process > scenic:SCENIC:ARBORETO_WITH... -
[- ] process > scenic:SCENIC:ADD_PEARSON_C... -
[- ] process > scenic:SCENIC:CISTARGET__MOTIF -
[- ] process > scenic:SCENIC:AUCELL__MOTIF -
[- ] process > scenic:SCENIC:VISUALIZE -
[- ] process > scenic:SCENIC:PUBLISH_LOOM -
[- ] process > scenic:PUBLISH_SCENIC:COMPR... -
[- ] process > scenic:PUBLISH_SCENIC:SC__P... -
------------------------------------------------------------------
No seed detected in the config
To ensure reproducibility the seed has been set to 250
------------------------------------------------------------------
[- ] process > scenic:SCENIC:ARBORETO_WITH... [ 0%] 0 of 1
[- ] process > scenic:SCENIC:ADD_PEARSON_C... -
[- ] process > scenic:SCENIC:CISTARGET__MOTIF -
[- ] process > scenic:SCENIC:AUCELL__MOTIF -
[- ] process > scenic:SCENIC:VISUALIZE -
[- ] process > scenic:SCENIC:PUBLISH_LOOM -
[- ] process > scenic:PUBLISH_SCENIC:COMPR... -
[- ] process > scenic:PUBLISH_SCENIC:SC__P... -
------------------------------------------------------------------
No seed detected in the config
To ensure reproducibility the seed has been set to 250
------------------------------------------------------------------
executor > slurm (1)
[84/38573e] process > scenic:SCENIC:ARBORETO_WITH... [100%] 1 of 1 ✔
[- ] process > scenic:SCENIC:ADD_PEARSON_C... -
[- ] process > scenic:SCENIC:CISTARGET__MOTIF -
[- ] process > scenic:SCENIC:AUCELL__MOTIF -
[- ] process > scenic:SCENIC:VISUALIZE -
[- ] process > scenic:SCENIC:PUBLISH_LOOM -
[- ] process > scenic:PUBLISH_SCENIC:COMPR... -
[- ] process > scenic:PUBLISH_SCENIC:SC__P... -
executor > slurm (2)
[84/38573e] process > scenic:SCENIC:ARBORETO_WITH... [100%] 1 of 1 ✔
[02/fe8e56] process > scenic:SCENIC:ADD_PEARSON_C... [ 0%] 0 of 1
[- ] process > scenic:SCENIC:CISTARGET__MOTIF -
[- ] process > scenic:SCENIC:AUCELL__MOTIF -
[- ] process > scenic:SCENIC:VISUALIZE -
[- ] process > scenic:SCENIC:PUBLISH_LOOM -
[- ] process > scenic:PUBLISH_SCENIC:COMPR... -
[- ] process > scenic:PUBLISH_SCENIC:SC__P... -
executor > slurm (3)
[84/38573e] process > scenic:SCENIC:ARBORETO_WITH... [100%] 1 of 1 ✔
[02/fe8e56] process > scenic:SCENIC:ADD_PEARSON_C... [100%] 1 of 1 ✔
[d5/80085a] process > scenic:SCENIC:CISTARGET__MO... [ 0%] 0 of 1
[- ] process > scenic:SCENIC:AUCELL__MOTIF -
[- ] process > scenic:SCENIC:VISUALIZE -
[- ] process > scenic:SCENIC:PUBLISH_LOOM -
[- ] process > scenic:PUBLISH_SCENIC:COMPR... -
[- ] process > scenic:PUBLISH_SCENIC:SC__P... -
Caused by:
Process `scenic:SCENIC:CISTARGET__MOTIF (1)` terminated with an error exit status (1)
Command executed:
export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1
pyscenic ctx 10x_PBMC__adj.tsv Sus_scrofa.motifs_vs_regions.rankings.feather Sus_scrofa.motifs_vs_regions.scores.feather --annotations_fname motif2tf_orthologuous_bis.tbl --expression_mtx_fname pigs.loom --cell_id_attribute CellID --gene_attribute Gene --mode "dask_multiprocessing" --output 10x_PBMC__reg_mtf.csv.gz --num_workers 4
Command exit status:
1
Command output:
[ ] | 0% Completed | 21.0s
[ ] | 0% Completed | 21.1s
[ ] | 0% Completed | 21.2s
[ ] | 0% Completed | 21.3s
[ ] | 0% Completed | 21.4s
[ ] | 0% Completed | 21.5s
[ ] | 0% Completed | 21.6s
[ ] | 0% Completed | 21.7s
[ ] | 0% Completed | 21.9s
[ ] | 0% Completed | 22.0s
[ ] | 0% Completed | 22.1s
[ ] | 0% Completed | 22.2s
[ ] | 0% Completed | 22.3s
[ ] | 0% Completed | 22.4s
[ ] | 0% Completed | 22.5s
[ ] | 0% Completed | 22.6s
[ ] | 0% Completed | 22.8s
[ ] | 0% Completed | 22.9s
[ ] | 0% Completed | 23.0s
[ ] | 0% Completed | 23.1s
[ ] | 0% Completed | 23.2s
[ ] | 0% Completed | 23.3s
[ ] | 0% Completed | 23.4s
[ ] | 0% Completed | 23.5s
[ ] | 0% Completed | 23.6s
[ ] | 0% Completed | 23.7s
[ ] | 0% Completed | 23.9s
[ ] | 0% Completed | 24.0s
[ ] | 0% Completed | 24.1s
[ ] | 0% Completed | 24.2s
[ ] | 0% Completed | 24.3s
[ ] | 0% Completed | 24.4s
[ ] | 0% Completed | 24.5s
[ ] | 0% Completed | 24.6s
[ ] | 0% Completed | 24.7s
[ ] | 0% Completed | 24.9s
[ ] | 0% Completed | 25.0s
[ ] | 0% Completed | 25.1s
[ ] | 0% Completed | 25.2s
[ ] | 0% Completed | 25.3s
[ ] | 0% Completed | 25.4s
[ ] | 0% Completed | 25.5s
[ ] | 0% Completed | 25.6s
[ ] | 0% Completed | 25.7s
[ ] | 0% Completed | 25.8s
[ ] | 0% Completed | 26.0s
[ ] | 0% Completed | 26.1s
[ ] | 0% Completed | 26.2s
[ ] | 0% Completed | 26.3s
[ ] | 0% Completed | 26.4s
Command error:
2021-06-10 20:09:39,736 - pyscenic.utils - INFO - Using existing Pearson correlations from the adjacencies file.
2021-06-10 20:09:40,036 - pyscenic.utils - INFO - Creating modules.
2021-06-10 20:15:21,369 - pyscenic.cli.pyscenic - INFO - Loading databases.
2021-06-10 20:15:21,370 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
Traceback (most recent call last):
File "/opt/venv/bin/pyscenic", line 8, in <module>
sys.exit(main())
File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 675, in main
args.func(args)
File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 230, in prune_targets_command
num_workers=args.num_workers,
File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 410, in prune2df
module_chunksize,
File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 334, in _distributed_calc
scheduler='processes', num_workers=num_workers if num_workers else cpu_count()
File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 281, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 563, in compute
results = schedule(dsk, keys, **kwargs)
File "/opt/venv/lib/python3.7/site-packages/dask/multiprocessing.py", line 228, in get
**kwargs
File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 487, in get_async
raise_exception(exc, tb)
File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 317, in reraise
raise exc
File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 222, in execute_task
result = _execute_task(task, data)
File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in <genexpr>
return func(*(_execute_task(a, cache) for a in args))
File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in modules2df
[module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in <listcomp>
[module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in module2df
db, module, motif_annotations, weighted_recovery=weighted_recovery
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 152, in module2features_auc1st_impl
df = db.load(module)
File "/opt/venv/lib/python3.7/site-packages/ctxcore/rnkdb.py", line 325, in load
df.set_index(self._index_name, inplace=True)
File "/opt/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 4724, in set_index
raise KeyError(f"None of {missing} are in the columns")
KeyError: "None of ['features'] are in the columns"
Work dir:
/work/adufour/scenic_pig/direct_annot/work/d5/80085a433e79cc65af68a03811e6c2
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
executor > slurm (3)
[84/38573e] process > scenic:SCENIC:ARBORETO_WITH... [100%] 1 of 1 ✔
[02/fe8e56] process > scenic:SCENIC:ADD_PEARSON_C... [100%] 1 of 1 ✔
[d5/80085a] process > scenic:SCENIC:CISTARGET__MO... [100%] 1 of 1, failed: 1 ✘
[- ] process > scenic:SCENIC:AUCELL__MOTIF -
[- ] process > scenic:SCENIC:VISUALIZE -
[- ] process > scenic:SCENIC:PUBLISH_LOOM -
[- ] process > scenic:PUBLISH_SCENIC:COMPR... -
[- ] process > scenic:PUBLISH_SCENIC:SC__P... -
Error executing process > 'scenic:SCENIC:CISTARGET__MOTIF (1)'
Caused by:
Process `scenic:SCENIC:CISTARGET__MOTIF (1)` terminated with an error exit status (1)
Command executed:
export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1
pyscenic ctx 10x_PBMC__adj.tsv Sus_scrofa.motifs_vs_regions.rankings.feather Sus_scrofa.motifs_vs_regions.scores.feather --annotations_fname motif2tf_orthologuous_bis.tbl --expression_mtx_fname pigs.loom --cell_id_attribute CellID --gene_attribute Gene --mode "dask_multiprocessing" --output 10x_PBMC__reg_mtf.csv.gz --num_workers 4
Command exit status:
1
Command output:
[ ] | 0% Completed | 21.0s
[ ] | 0% Completed | 21.1s
[ ] | 0% Completed | 21.2s
[ ] | 0% Completed | 21.3s
[ ] | 0% Completed | 21.4s
[ ] | 0% Completed | 21.5s
[ ] | 0% Completed | 21.6s
[ ] | 0% Completed | 21.7s
[ ] | 0% Completed | 21.9s
[ ] | 0% Completed | 22.0s
[ ] | 0% Completed | 22.1s
[ ] | 0% Completed | 22.2s
[ ] | 0% Completed | 22.3s
[ ] | 0% Completed | 22.4s
[ ] | 0% Completed | 22.5s
[ ] | 0% Completed | 22.6s
[ ] | 0% Completed | 22.8s
[ ] | 0% Completed | 22.9s
[ ] | 0% Completed | 23.0s
[ ] | 0% Completed | 23.1s
[ ] | 0% Completed | 23.2s
[ ] | 0% Completed | 23.3s
[ ] | 0% Completed | 23.4s
[ ] | 0% Completed | 23.5s
[ ] | 0% Completed | 23.6s
[ ] | 0% Completed | 23.7s
[ ] | 0% Completed | 23.9s
[ ] | 0% Completed | 24.0s
[ ] | 0% Completed | 24.1s
[ ] | 0% Completed | 24.2s
[ ] | 0% Completed | 24.3s
[ ] | 0% Completed | 24.4s
[ ] | 0% Completed | 24.5s
[ ] | 0% Completed | 24.6s
[ ] | 0% Completed | 24.7s
[ ] | 0% Completed | 24.9s
[ ] | 0% Completed | 25.0s
[ ] | 0% Completed | 25.1s
[ ] | 0% Completed | 25.2s
[ ] | 0% Completed | 25.3s
[ ] | 0% Completed | 25.4s
[ ] | 0% Completed | 25.5s
[ ] | 0% Completed | 25.6s
[ ] | 0% Completed | 25.7s
[ ] | 0% Completed | 25.8s
[ ] | 0% Completed | 26.0s
[ ] | 0% Completed | 26.1s
[ ] | 0% Completed | 26.2s
[ ] | 0% Completed | 26.3s
[ ] | 0% Completed | 26.4s
Command error:
2021-06-10 20:09:39,736 - pyscenic.utils - INFO - Using existing Pearson correlations from the adjacencies file.
2021-06-10 20:09:40,036 - pyscenic.utils - INFO - Creating modules.
2021-06-10 20:15:21,369 - pyscenic.cli.pyscenic - INFO - Loading databases.
2021-06-10 20:15:21,370 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
Traceback (most recent call last):
File "/opt/venv/bin/pyscenic", line 8, in <module>
sys.exit(main())
File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 675, in main
args.func(args)
File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 230, in prune_targets_command
num_workers=args.num_workers,
File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 410, in prune2df
module_chunksize,
File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 334, in _distributed_calc
scheduler='processes', num_workers=num_workers if num_workers else cpu_count()
File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 281, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 563, in compute
results = schedule(dsk, keys, **kwargs)
File "/opt/venv/lib/python3.7/site-packages/dask/multiprocessing.py", line 228, in get
**kwargs
File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 487, in get_async
raise_exception(exc, tb)
File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 317, in reraise
raise exc
File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 222, in execute_task
result = _execute_task(task, data)
File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in <genexpr>
return func(*(_execute_task(a, cache) for a in args))
File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in modules2df
[module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in <listcomp>
[module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in module2df
db, module, motif_annotations, weighted_recovery=weighted_recovery
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 152, in module2features_auc1st_impl
df = db.load(module)
File "/opt/venv/lib/python3.7/site-packages/ctxcore/rnkdb.py", line 325, in load
df.set_index(self._index_name, inplace=True)
File "/opt/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 4724, in set_index
raise KeyError(f"None of {missing} are in the columns")
KeyError: "None of ['features'] are in the columns"
Work dir:
/work/adufour/scenic_pig/direct_annot/work/d5/80085a433e79cc65af68a03811e6c2
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line```
@Goultard59 It looks to me that you are using the wrong database:
/home/adufour/work/cistargetdb/feather/Sus_scrofa.motifs_vs_regions.*.feather'
Can you list all files in /home/adufour/work/cistargetdb/feather/
?
The database file you should use, is:
'/home/adufour/work/cistargetdb/feather/Sus_scrofa.regions_vs_motifs.rankings.feather'
Hello,
The output file are : Sus_scrofa.motifs_vs_regions.rankings.feather Sus_scrofa.motifs_vs_regions.scores.feather Sus_scrofa.regions_vs_motifs.rankings.feather Sus_scrofa.regions_vs_motifs.scores.feather
I now get a new error, it's probably caused by an error in the gene name but I don't know in which file i need to investigate.
Thanks for your helps
Caused by:
Process `scenic:SCENIC:CISTARGET__MOTIF (1)` terminated with an error exit status (1)
Command executed:
export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1
pyscenic ctx 10x_PBMC__adj.tsv Sus_scrofa.regions_vs_motifs.rankings.feather --annotations_fname motif2tf_orthologuous_bis.tbl --expression_mtx_fname pigs.loom --cell_id_attribute CellID --gene_attribute Gene --mode "dask_multiprocessing" --output 10x_PBMC__reg_mtf.csv.gz --num_workers 4
Command exit status:
1
Command output:
[ ] | 0% Completed | 1min 45.1s
[ ] | 0% Completed | 1min 45.2s
[ ] | 0% Completed | 1min 45.3s
[ ] | 0% Completed | 1min 45.4s
[ ] | 0% Completed | 1min 45.5s
[ ] | 0% Completed | 1min 45.6s
[ ] | 0% Completed | 1min 45.7s
[ ] | 0% Completed | 1min 45.8s
[ ] | 0% Completed | 1min 45.9s
[ ] | 0% Completed | 1min 46.0s
[ ] | 0% Completed | 1min 46.1s
[ ] | 0% Completed | 1min 46.2s
[ ] | 0% Completed | 1min 46.3s
[ ] | 0% Completed | 1min 46.4s
[ ] | 0% Completed | 1min 46.5s
[ ] | 0% Completed | 1min 46.6s
[ ] | 0% Completed | 1min 46.7s
[ ] | 0% Completed | 1min 46.8s
[ ] | 0% Completed | 1min 46.9s
[ ] | 0% Completed | 1min 47.0s
[ ] | 0% Completed | 1min 47.1s
[ ] | 0% Completed | 1min 47.2s
[ ] | 0% Completed | 1min 47.3s
[ ] | 0% Completed | 1min 47.4s
[ ] | 0% Completed | 1min 47.5s
[ ] | 0% Completed | 1min 47.6s
[ ] | 0% Completed | 1min 47.7s
[ ] | 0% Completed | 1min 47.8s
[ ] | 0% Completed | 1min 47.9s
[ ] | 0% Completed | 1min 48.0s
[ ] | 0% Completed | 1min 48.1s
[ ] | 0% Completed | 1min 48.2s
[ ] | 0% Completed | 1min 48.3s
[ ] | 0% Completed | 1min 48.4s
[ ] | 0% Completed | 1min 48.5s
[ ] | 0% Completed | 1min 48.6s
[ ] | 0% Completed | 1min 48.7s
[ ] | 0% Completed | 1min 48.8s
[ ] | 0% Completed | 1min 48.9s
[ ] | 0% Completed | 1min 49.0s
[ ] | 0% Completed | 1min 49.1s
[ ] | 0% Completed | 1min 49.2s
[ ] | 0% Completed | 1min 49.3s
[ ] | 0% Completed | 1min 49.4s
[ ] | 0% Completed | 1min 49.5s
[ ] | 0% Completed | 1min 49.6s
[ ] | 0% Completed | 1min 49.7s
[ ] | 0% Completed | 1min 49.8s
[ ] | 0% Completed | 1min 49.9s
[ ] | 0% Completed | 1min 50.0s
Command error:
2021-06-11 23:49:03,128 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for ENSSSCG00000001202 could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.
2021-06-11 23:49:03,373 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for BBX could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.
2021-06-11 23:49:03,719 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for GPBP1 could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.
2021-06-11 23:49:04,061 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for ENSSSCG00000001945 could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.
2021-06-11 23:49:05,151 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for GRHL1 could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.
2021-06-11 23:49:05,181 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for ENSSSCG00000002937 could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.
Traceback (most recent call last):
File "/opt/venv/bin/pyscenic", line 8, in <module>
sys.exit(main())
File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 675, in main
args.func(args)
File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 230, in prune_targets_command
num_workers=args.num_workers,
File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 410, in prune2df
module_chunksize,
File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 334, in _distributed_calc
scheduler='processes', num_workers=num_workers if num_workers else cpu_count()
File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 281, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 563, in compute
results = schedule(dsk, keys, **kwargs)
File "/opt/venv/lib/python3.7/site-packages/dask/multiprocessing.py", line 228, in get
**kwargs
File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 487, in get_async
raise_exception(exc, tb)
File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 317, in reraise
raise exc
File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 222, in execute_task
result = _execute_task(task, data)
File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in <genexpr>
return func(*(_execute_task(a, cache) for a in args))
File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in modules2df
[module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in <listcomp>
[module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in module2df
db, module, motif_annotations, weighted_recovery=weighted_recovery
File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 203, in module2features_auc1st_impl
rccs = rccs[enriched_features_idx, :][annotated_features_idx, :]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 23 but corresponding boolean dimension is 26
Work dir:
/work/adufour/scenic_pig/direct_annot/work/ea/10365d55145ca380aa7b73917cd7f9
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Does the names in your 10x_PBMC__adj.tsv
matrix match with the one in your database?
Hello,
There are some genes that are duplicated (U6 genes for examples) which are duplicates in my loom expression matrix, I don't know how to merge them in loom files format, if i use .var_names_make_unique() in anndata the name U6-1 may not match with my U6 gene in my feather database. If i not make them unique, it leads me to a memory error.
The error message :
executor > slurm (1)
[a9/edc839] process > scenic:SCENIC:ARBORETO_WITH... [100%] 1 of 1, cached: 1 ✔
[3d/be9715] process > scenic:SCENIC:ADD_PEARSON_C... [100%] 1 of 1, failed: 1 ✘
[- ] process > scenic:SCENIC:CISTARGET__MOTIF -
[- ] process > scenic:SCENIC:AUCELL__MOTIF -
[- ] process > scenic:SCENIC:VISUALIZE -
[- ] process > scenic:SCENIC:PUBLISH_LOOM -
[- ] process > scenic:PUBLISH_SCENIC:COMPR... -
[- ] process > scenic:PUBLISH_SCENIC:SC__P... -
Error executing process > 'scenic:SCENIC:ADD_PEARSON_CORRELATION (1)'
Caused by:
Process `scenic:SCENIC:ADD_PEARSON_CORRELATION (1)` terminated with an error exit status (1)
Command executed:
pyscenic add_cor 10x_PBMC__adj.tsv.gz pigs_join.loom --output 10x_PBMC__adj.tsv --cell_id_attribute obs_names --gene_attribute var_names
Command exit status:
1
Command output:
(empty)
Command error:
2021-06-22 14:39:39,289 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.
2021-06-22 14:39:49,951 - pyscenic.cli.pyscenic - INFO - Calculating correlations.
Traceback (most recent call last):
File "/opt/venv/bin/pyscenic", line 8, in <module>
sys.exit(main())
File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 675, in main
args.func(args)
File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 164, in addCorrelations
adjacencies_wCor = add_correlation(adjacencies, ex_mtx, rho_threshold=0.03, mask_dropouts=args.mask_dropouts)
File "/opt/venv/lib/python3.7/site-packages/pyscenic/utils.py", line 144, in add_correlation
corr_mtx = pd.DataFrame(index=ex_mtx.columns, columns=ex_mtx.columns, data=np.corrcoef(ex_mtx.values.T))
File "<__array_function__ internals>", line 6, in corrcoef
File "/opt/venv/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2634, in corrcoef
c = cov(x, y, rowvar, dtype=dtype)
File "<__array_function__ internals>", line 6, in cov
File "/opt/venv/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2492, in cov
c = dot(X, X_T.conj())
File "<__array_function__ internals>", line 6, in dot
numpy.core._exceptions.MemoryError: Unable to allocate 4.74 TiB for an array with shape (806922, 806922) and data type float64
Work dir:
/work/adufour/scenic_pig/direct_annot/work/3d/be9715468d4ff50ca95fcb5ee68bfc
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
Hi,
I have swetching to Ensembl gene id to prevent memmory error. I have checked correspondance between :
motif name in feather file and annotation file (and conversely) gene name in feather file and expression matrix gene name in annotation files and feather file
But i still get this error messageIndexError: boolean index did not match indexed array along dimension 0; dimension is 235 but corresponding boolean dimension is 240
as described above. I also visually printed the variable : rcc, enriched_features_idx, annotated_features_idx which seems to be in the good format.
Thanks for your helps.
Hi @Goultard59,
I am a noviciate for the regulon analysis.
I am wonder that how to create the motif.tbl file from the species that you want to. in my case, i want to build the Rat cisTarget database through this repo's script, but the motif.tbl file can not be generated at the same time.
Hope your help, in which guiding me to generate the motif.tbl file like below format from Rattus norvegicus?
Any advice would be appreciated.
Best, Hanhuihong
@honghh2018 You can use the mouse or human motif table and replace the gene_name column with the homologous gene for rat.
You can get them for example with Ensembl biomart: http://www.ensembl.org/biomart/martview/52da5ebcda00ffbf86e66e48d107b54a?VIRTUALSCHEMANAME=default&ATTRIBUTES=mmusculus_gene_ensembl.default.homologs.external_gene_name|mmusculus_gene_ensembl.default.homologs.rnorvegicus_homolog_associated_gene_name&FILTERS=mmusculus_gene_ensembl.default.filters.with_rnorvegicus_homolog.only&VISIBLEPANEL=resultspanel
As complementary advice, I highly recommend you to double-cross your expression matrix and annotation motif table.
Genes can be duplicates in your genomes (ex : U6 genes) so it might be interesting to prefer ENSEMBL genes id In that case 10x pipeline will append a suffix -2 to each duplicate
You also need to be careful with duplicate couple of motif_id + gene_name. That will lead to the previous error message.
Hello,
I successfully run the scripts on my genomes. It might be heplful if you add a linked to your Collection of Position Weight Matrices in the ReadMe file and a description of the cluster buster motif format.
I would like to know how the motif-TF direct annotations have been constructed for the TF annotation file ?
Thanks for the helps.