aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

motif2TF database #3

Closed Goultard59 closed 2 years ago

Goultard59 commented 3 years ago

Hello,

I successfully run the scripts on my genomes. It might be heplful if you add a linked to your Collection of Position Weight Matrices in the ReadMe file and a description of the cluster buster motif format.

I would like to know how the motif-TF direct annotations have been constructed for the TF annotation file ?

Thanks for the helps.

tropfenameimer commented 3 years ago

hi @Goultard59, the motif collection used for the pre-calculated cisTarget data bases includes several publicly available data sets, e.g. from JASPAR (http://jaspar.genereg.net/downloads/), fly factor survey (https://pgfe.umassmed.edu/ffs/), homer (http://homer.ucsd.edu/homer/motif/HomerMotifDB/homerResults.html), ... (check the 'source_name' column in the motif annotation tables https://resources.aertslab.org/cistarget/motif2tf)

for scoring with cbust, the motifs need to be brought into a specific format described here: https://orca.bu.edu/page/ClusterBuster_download

The motif file should contain matrices in the following format:
>element1
0  4 2 14
12 0 0 8
8  0 1 11
20 0 0 0
>element2
13 1 1 5
The rows of each matrix correspond to successive positions of the motif, from 5' to 3', and the columns indicate the frequencies of A, C, G, and T, respectively, in each position. These frequencies are usually obtained from alignments of protein-binding sites."

i'll add this information to the documentation.

the assignment of a motif to a specific TF can be direct (e.g. when a motif has been characterized by ChIP-Seq), or indirect (inferred by motif or TF homology). The Motif2TF procedure is described by Janky et al. (https://doi.org/10.1371/journal.pcbi.1003731).

if you are working on an organism other than human, mouse or drosophila, with little knowledge about TF binding specificities, you can feed SCENIC a custom Motif2TF data base. therefore, i'd check for orthologous TFs and substitute the human/mouse/fly gene name in the Motif2TF table (https://resources.aertslab.org/cistarget/motif2tf/) by the gene names of your species of interest.

ghuls commented 3 years ago

BioPython Bio.motifsmodule contains support for reading quite a lot of motif formats and writing them to Cluster-Buster format.

@tropfenameimer For creating databases, each motif should be in a different motif file as Cluster-Buster will not score each motif independently when you put them in the same file.

macsalvin commented 3 years ago

Hi @tropfenameimer, I need to create the mouse database with an additional motif. In order to run create_cisTarget_databases the procedure say that you need the motif collection and the FASTA file. I already have the motif collection including the new motif (in cb format) but I don't understand where to get the right FASTA file. My question is which FASTA file should I use it in order to run create_cisTarget_databases? In the gencode website (https://www.gencodegenes.org/mouse/release_M25.html) there are several files and the "Genome sequence (GRCm38.p6)" file doesn't have gene annotation (it contain just the chromosome annotation). Could you provide me the FASTA file you used in order to run it in the same reference?

Another question is about how to add this "direct motif" into the TF annotation file (in my case motifs-v9-nr.mgi-m0.001-o0.0.tbl).

Thank you in advance.

Goultard59 commented 3 years ago

Hi and thanks for your Answers,

I used orthofinder for gene orthologue between human and pigs (unfortunately only 60% of my TF overlap the human motif2tf annotation file). But I'm getting an error while running nextflow run aertslab/SCENICprotocol : KeyError: 'MotifSimilarityQvalue'

Thanks for the helps.

cflerin commented 3 years ago

Hi @Goultard59 ,

Can you try renaming that column in your annotation file? pySCENIC in particular is looking for specific column names (sorry, it's a current limitation). It should look be motif_similarity_qvalue something like:

#motif_id       gene_name       motif_similarity_qvalue orthologous_identity    description
ENCFF767XQV     CTCF    0.000   1.000   CTCF (vagina female adult (53 years))
ENCFF677KBX     ZNF584  0.000   1.000   eGFP-ZNF584 (K562 genetically modified using CRISPR)
ENCFF727RFJ     ZNF584  0.000   1.000   eGFP-ZNF584 (K562 genetically modified using CRISPR)
ENCFF986ATH     ZNF584  0.000   1.000   eGFP-ZNF584 (K562 genetically modified using CRISPR)
ENCFF662UNB     JUN     0.000   1.000   JUN (MCF-7)
ENCFF473URL     JUN     0.000   1.000   JUN (MCF-7)
ENCFF370DGU     JUN     0.000   1.000   JUN (MCF-7)
ENCFF528ENR     CTCF    0.000   1.000   CTCF (tibial artery male adult (37 years))
ENCFF380FCK     IRF3    0.000   1.000   IRF3 (GM12878)
Goultard59 commented 3 years ago

Hello,

Here is the head of the file :

#motif_id   motif_name  motif_description   source_name source_version  motif_similarity_qvalue similar_motif_id    similar_motif_description   orthologous_identity    orthologous_gene_name   orthologous_species description gene_name
bergman__Abd-B  Abd-B   Abd-B   bergman 1.1 6e-04   cisbp__M1008    HOXA6[gene ID: "ENSG00000106006" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; HOXB9[gene ID: "ENSG00000170689" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; HOXC9[gene ID: "ENSG00000180806" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; Hoxa9[gene ID: "ENSMUSG00000038227" species: "Mus musculus" TF status: "direct" TF family: "Homeodomain" DBDs: "Homeobox"]; Hoxb9[gene ID: "ENSMUSG00000020875" species: "Mus musculus" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; NP_032296.2[gene ID: "NP_032296.2" species: "Mus musculus" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]  0.981618    ENSMUSG00000038227  M. musculus gene is orthologous to ENSMUSG00000038227 in M. musculus (identity = 98%) which is annotated for similar motif cisbp__M1008 ('HOXA6[gene ID: "ENSG00000106006" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; HOXB9[gene ID: "ENSG00000170689" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; HOXC9[gene ID: "ENSG00000180806" species: "Homo sapiens" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; Hoxa9[gene ID: "ENSMUSG00000038227" species: "Mus musculus" TF status: "direct" TF family: "Homeodomain" DBDs: "Homeobox"]; Hoxb9[gene ID: "ENSMUSG00000020875" species: "Mus musculus" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]; NP_032296.2[gene ID: "NP_032296.2" species: "Mus musculus" TF status: "inferred" TF family: "Homeodomain" DBDs: "Homeobox"]'; q-value = 0.0006)    ENSSSCG00000028997
bergman__Aef1   Aef1    Aef1    bergman 1.1 0   None    None    0.213656    FBgn0005694 D. melanogaster motif is annotated for orthologous gene FBgn0005694 in D. melanogaster (identity = 21%) ZNF8
bergman__Cf2    Cf2 Cf2 bergman 1.1 0   None    None    0.15098 FBgn0000286 D. melanogaster motif is annotated for orthologous gene FBgn0000286 in D. melanogaster (identity = 15%) ZNF853
bergman__EcR_usp    EcR_usp EcR/usp bergman 1.1 0   None    None    0.378261    FBgn0000546 D. melanogaster gene is orthologous to FBgn0000546 in D. melanogaster (identity = 37%) which is directly annotated for motif    NR1H2
cflerin commented 3 years ago

Hi @Goultard59 , I see you already have the proper column name, in this case it's a different issue with pySCENIC than I first thought. I will have to look further at this. Can you also include the full command that you ran, and the error message that you got?

Goultard59 commented 3 years ago

Hello @cflerin,

I do a little bite of cleaning in the motif annotation file (removing the motif which are not included in my feather files, removing motif similarity) but I still get the same issue.

I'm using the VSN nextflow pipeline with the following command

module load bioinfo/Nextflow-v20.11.0-edge
module load system/singularity-3.5.3
nextflow -C /home/adufour/work/scenic_pig/pigs.vsn-pipelines.complete.config run vib-singlecell-nf/vsn-pipelines -entry scenic`

N E X T F L O W  ~  version 20.11.0-edge
Launching `vib-singlecell-nf/vsn-pipelines` [cranky_leavitt] - revision: 721c42f889 [master]
executor >  local (3)
[fc/1cf8f4] process > scenic:SCENIC:ARBORETO_WITH_MULTIPROCESSING (1) [100%] 1 of 1 ✔
[e9/f55d91] process > scenic:SCENIC:ADD_PEARSON_CORRELATION (1)       [100%] 1 of 1 ✔
[bf/fe6141] process > scenic:SCENIC:CISTARGET__MOTIF (1)              [  0%] 0 of 1
[-        ] process > scenic:SCENIC:AUCELL__MOTIF                     -
[-        ] process > scenic:SCENIC:VISUALIZE                         -
[-        ] process > scenic:SCENIC:PUBLISH_LOOM                      -
[-        ] process > scenic:PUBLISH_SCENIC:COMPRESS_HDF5             -
[-        ] process > scenic:PUBLISH_SCENIC:SC__PUBLISH               -
Error executing process > 'scenic:SCENIC:CISTARGET__MOTIF (1)'

Caused by:
  Process `scenic:SCENIC:CISTARGET__MOTIF (1)` terminated with an error exit status (1)

Command executed:

  export MKL_NUM_THREADS=1
  export NUMEXPR_NUM_THREADS=1
  export OMP_NUM_THREADS=1
        pyscenic ctx             test_pigs__adj.tsv             Sus_scrofa.feather.motifs_vs_regions.scores.feather Sus_scrofa.feather.regions_vs_motifs.scores.feather Sus_scrofa.feather.regions_vs_motifs.rankings.feather Sus_scrofa.feather.motifs_vs_regions.rankings.feather                          --annotations_fname motif2tf_orthologuous.tbl             --expression_mtx_fname human.loom             --cell_id_attribute CellID             --gene_attribute Gene             --mode "dask_multiprocessing"             --output test_pigs__reg_mtf.csv.gz             --num_workers 4

Command exit status:
  1

Command output:
  (empty)

Command error:
  /opt/venv/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
    data = yaml.load(f.read()) or {}

  2021-06-01 14:03:29,162 - pyscenic.cli.pyscenic - INFO - Creating modules.
executor >  local (3)
[fc/1cf8f4] process > scenic:SCENIC:ARBORETO_WITH_MULTIPROCESSING (1) [100%] 1 of 1 ✔
[e9/f55d91] process > scenic:SCENIC:ADD_PEARSON_CORRELATION (1)       [100%] 1 of 1 ✔
[bf/fe6141] process > scenic:SCENIC:CISTARGET__MOTIF (1)              [100%] 1 of 1, failed: 1 ✘
[-        ] process > scenic:SCENIC:AUCELL__MOTIF                     -
[-        ] process > scenic:SCENIC:VISUALIZE                         -
[-        ] process > scenic:SCENIC:PUBLISH_LOOM                      -
[-        ] process > scenic:PUBLISH_SCENIC:COMPRESS_HDF5             -
[-        ] process > scenic:PUBLISH_SCENIC:SC__PUBLISH               -
Error executing process > 'scenic:SCENIC:CISTARGET__MOTIF (1)'

Caused by:
  Process `scenic:SCENIC:CISTARGET__MOTIF (1)` terminated with an error exit status (1)

Command executed:

  export MKL_NUM_THREADS=1
  export NUMEXPR_NUM_THREADS=1
  export OMP_NUM_THREADS=1
        pyscenic ctx             test_pigs__adj.tsv             Sus_scrofa.feather.motifs_vs_regions.scores.feather Sus_scrofa.feather.regions_vs_motifs.scores.feather Sus_scrofa.feather.regions_vs_motifs.rankings.feather Sus_scrofa.feather.motifs_vs_regions.rankings.feather                          --annotations_fname motif2tf_orthologuous.tbl             --expression_mtx_fname human.loom             --cell_id_attribute CellID             --gene_attribute Gene             --mode "dask_multiprocessing"             --output test_pigs__reg_mtf.csv.gz             --num_workers 4

Command exit status:
  1

Command output:
  (empty)

Command error:
  /opt/venv/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
    data = yaml.load(f.read()) or {}

  2021-06-01 14:03:29,162 - pyscenic.cli.pyscenic - INFO - Creating modules.

  2021-06-01 14:03:31,561 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.

  2021-06-01 14:03:31,798 - pyscenic.utils - INFO - Using existing Pearson correlations from the adjacencies file.

  2021-06-01 14:03:31,965 - pyscenic.utils - INFO - Creating modules.

  2021-06-01 14:06:54,593 - pyscenic.cli.pyscenic - INFO - Loading databases.

  2021-06-01 14:06:54,593 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
  Traceback (most recent call last):
    File "/opt/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
      return self._engine.get_loc(key)
    File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
    File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
  KeyError: 'MotifSimilarityQvalue'

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/opt/venv/bin/pyscenic", line 8, in <module>
      sys.exit(main())
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 470, in main
      args.func(args)
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 194, in prune_targets_command
      num_workers=args.num_workers)
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 351, in prune2df
      num_workers, module_chunksize)
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 221, in _distributed_calc
      orthologous_identity_threshold=orthologuous_identity_threshold)
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/utils.py", line 51, in load_motif_annotations
      df = df[(df[COLUMN_NAME_MOTIF_SIMILARITY_QVALUE] <= motif_similarity_fdr) &
    File "/opt/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2995, in __getitem__
      indexer = self.columns.get_loc(key)
    File "/opt/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
      return self._engine.get_loc(self._maybe_cast_indexer(key))
    File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
    File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
  KeyError: 'MotifSimilarityQvalue'

Work dir:
  /work/adufour/scenic_pig/work/bf/fe6141c697d5e74b9cc264e821d159

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out

This is the motif annotation file i used motif2tf.xlsx

And here is the head of a feather output files

c2h2zf-M0369 c2h2zf-M0373 c2h2zf-M0385 c2h2zf-M0393 c2h2zf-M0400 c2h2zf-M0401 c2h2zf-M0404 c2h2zf-M0406 c2h2zf-M0415 c2h2zf-M0462 yetfasco-798 yetfasco-8 yetfasco-815 yetfasco-830 yetfasco-831 yetfasco-864 yetfasco-870 yetfasco-879 yetfasco-962 regions

18053 | 1300 | 12540 | 8606 | 17912 | 5161 | 6874 | 14082 | 656 | 18141 | ⋯ | 18894 | 7464 | 12420 | 1151 | 16874 | 7898 | 2785 | 3525 | 2062 | A1CF 18363 | 8154 | 2009 | 9263 | 17882 | 2304 | 6617 | 8956 | 18363 | 2949 | ⋯ | 11566 | 13700 | 13655 | 14519 | 12606 | 958 | 5562 | 12328 | 9300 | A2ML1 897 | 3947 | 20218 | 2830 | 14819 | 1962 | 6344 | 9476 | 1618 | 13230 | ⋯ | 7577 | 17988 | 7592 | 14544 | 12894 | 2366 | 4094 | 400 | 1174 | A3GALT2 3394 | 14057 | 14969 | 4354 | 10218 | 5015 | 20044 | 4652 | 6266 | 5063 | ⋯ | 5444 | 11594 | 7530 | 18590 | 14391 | 7067 | 6691 | 11897 | 13953 | A4GALT 8291 | 3475 | 11018 | 2995 | 14180 | 1563 | 1435 | 19162 | 15015 | 19187 | ⋯ | 13802 | 20057 | 5589 | 11866 | 5672 | 1039 | 11391 | 7269 | 12654 | A4GNT 19337 | 2392 | 19853 | 11336 | 9885 | 12297 | 1993 | 18413 | 7645 | 20437 | ⋯ | 11580 | 14298 | 20610 | 10711 | 4766 | 19464 | 15643 | 18952 | 4423 | AAAS

ghuls commented 3 years ago

@Goultard59 Can you run:

head  motif2tf_orthologuous.tbl

file motif2tf_orthologuous.tbl

on your motif annotation file? It has to be a TSV (TAB separated) file. Also make sure that Excell is not corrupting your file (e.g. changing genenames to dates and stuff). If you want to use a graphical editor, it might be better to use LibreOffice, which provides an option "Save as... ==> CSV ==> Edit filter settings". Screenshot from 2021-06-04 11-06-38

Goultard59 commented 3 years ago

Thanks for your responses, the error was caused by the columns order.

Unfortunately, I'm still blocked by the same error as #6 My config file is the following : https://pastebin.com/PSW2wwuU

Thanks for your help.


Launching `vib-singlecell-nf/vsn-pipelines` [sleepy_jepsen] - revision: 721c42f889 [master]
[-        ] process > scenic:SCENIC:ARBORETO_WITH... -

[-        ] process > scenic:SCENIC:ARBORETO_WITH... -
[-        ] process > scenic:SCENIC:ADD_PEARSON_C... -
[-        ] process > scenic:SCENIC:CISTARGET__MOTIF -
[-        ] process > scenic:SCENIC:AUCELL__MOTIF    -

[-        ] process > scenic:SCENIC:ARBORETO_WITH... -
[-        ] process > scenic:SCENIC:ADD_PEARSON_C... -
[-        ] process > scenic:SCENIC:CISTARGET__MOTIF -
[-        ] process > scenic:SCENIC:AUCELL__MOTIF    -
[-        ] process > scenic:SCENIC:VISUALIZE        -
[-        ] process > scenic:SCENIC:PUBLISH_LOOM     -

[-        ] process > scenic:SCENIC:ARBORETO_WITH... -
[-        ] process > scenic:SCENIC:ADD_PEARSON_C... -
[-        ] process > scenic:SCENIC:CISTARGET__MOTIF -
[-        ] process > scenic:SCENIC:AUCELL__MOTIF    -
[-        ] process > scenic:SCENIC:VISUALIZE        -
[-        ] process > scenic:SCENIC:PUBLISH_LOOM     -
[-        ] process > scenic:PUBLISH_SCENIC:COMPR... -
[-        ] process > scenic:PUBLISH_SCENIC:SC__P... -

------------------------------------------------------------------
 No seed detected in the config 
 To ensure reproducibility the seed has been set to 250 
------------------------------------------------------------------

[-        ] process > scenic:SCENIC:ARBORETO_WITH... [  0%] 0 of 1
[-        ] process > scenic:SCENIC:ADD_PEARSON_C... -
[-        ] process > scenic:SCENIC:CISTARGET__MOTIF -
[-        ] process > scenic:SCENIC:AUCELL__MOTIF    -
[-        ] process > scenic:SCENIC:VISUALIZE        -
[-        ] process > scenic:SCENIC:PUBLISH_LOOM     -
[-        ] process > scenic:PUBLISH_SCENIC:COMPR... -
[-        ] process > scenic:PUBLISH_SCENIC:SC__P... -

------------------------------------------------------------------
 No seed detected in the config 
 To ensure reproducibility the seed has been set to 250 
------------------------------------------------------------------

executor >  slurm (1)
[84/38573e] process > scenic:SCENIC:ARBORETO_WITH... [100%] 1 of 1 ✔
[-        ] process > scenic:SCENIC:ADD_PEARSON_C... -
[-        ] process > scenic:SCENIC:CISTARGET__MOTIF -
[-        ] process > scenic:SCENIC:AUCELL__MOTIF    -
[-        ] process > scenic:SCENIC:VISUALIZE        -
[-        ] process > scenic:SCENIC:PUBLISH_LOOM     -
[-        ] process > scenic:PUBLISH_SCENIC:COMPR... -
[-        ] process > scenic:PUBLISH_SCENIC:SC__P... -

executor >  slurm (2)
[84/38573e] process > scenic:SCENIC:ARBORETO_WITH... [100%] 1 of 1 ✔
[02/fe8e56] process > scenic:SCENIC:ADD_PEARSON_C... [  0%] 0 of 1
[-        ] process > scenic:SCENIC:CISTARGET__MOTIF -
[-        ] process > scenic:SCENIC:AUCELL__MOTIF    -
[-        ] process > scenic:SCENIC:VISUALIZE        -
[-        ] process > scenic:SCENIC:PUBLISH_LOOM     -
[-        ] process > scenic:PUBLISH_SCENIC:COMPR... -
[-        ] process > scenic:PUBLISH_SCENIC:SC__P... -

executor >  slurm (3)
[84/38573e] process > scenic:SCENIC:ARBORETO_WITH... [100%] 1 of 1 ✔
[02/fe8e56] process > scenic:SCENIC:ADD_PEARSON_C... [100%] 1 of 1 ✔
[d5/80085a] process > scenic:SCENIC:CISTARGET__MO... [  0%] 0 of 1
[-        ] process > scenic:SCENIC:AUCELL__MOTIF    -
[-        ] process > scenic:SCENIC:VISUALIZE        -
[-        ] process > scenic:SCENIC:PUBLISH_LOOM     -
[-        ] process > scenic:PUBLISH_SCENIC:COMPR... -
[-        ] process > scenic:PUBLISH_SCENIC:SC__P... -

Caused by:
  Process `scenic:SCENIC:CISTARGET__MOTIF (1)` terminated with an error exit status (1)

Command executed:

  export MKL_NUM_THREADS=1
  export NUMEXPR_NUM_THREADS=1
  export OMP_NUM_THREADS=1
        pyscenic ctx             10x_PBMC__adj.tsv             Sus_scrofa.motifs_vs_regions.rankings.feather Sus_scrofa.motifs_vs_regions.scores.feather                          --annotations_fname motif2tf_orthologuous_bis.tbl             --expression_mtx_fname pigs.loom             --cell_id_attribute CellID             --gene_attribute Gene             --mode "dask_multiprocessing"             --output 10x_PBMC__reg_mtf.csv.gz             --num_workers 4

Command exit status:
  1

Command output:
  [                                        ] | 0% Completed | 21.0s
  [                                        ] | 0% Completed | 21.1s
  [                                        ] | 0% Completed | 21.2s
  [                                        ] | 0% Completed | 21.3s
  [                                        ] | 0% Completed | 21.4s
  [                                        ] | 0% Completed | 21.5s
  [                                        ] | 0% Completed | 21.6s
  [                                        ] | 0% Completed | 21.7s
  [                                        ] | 0% Completed | 21.9s
  [                                        ] | 0% Completed | 22.0s
  [                                        ] | 0% Completed | 22.1s
  [                                        ] | 0% Completed | 22.2s
  [                                        ] | 0% Completed | 22.3s
  [                                        ] | 0% Completed | 22.4s
  [                                        ] | 0% Completed | 22.5s
  [                                        ] | 0% Completed | 22.6s
  [                                        ] | 0% Completed | 22.8s
  [                                        ] | 0% Completed | 22.9s
  [                                        ] | 0% Completed | 23.0s
  [                                        ] | 0% Completed | 23.1s
  [                                        ] | 0% Completed | 23.2s
  [                                        ] | 0% Completed | 23.3s
  [                                        ] | 0% Completed | 23.4s
  [                                        ] | 0% Completed | 23.5s
  [                                        ] | 0% Completed | 23.6s
  [                                        ] | 0% Completed | 23.7s
  [                                        ] | 0% Completed | 23.9s
  [                                        ] | 0% Completed | 24.0s
  [                                        ] | 0% Completed | 24.1s
  [                                        ] | 0% Completed | 24.2s
  [                                        ] | 0% Completed | 24.3s
  [                                        ] | 0% Completed | 24.4s
  [                                        ] | 0% Completed | 24.5s
  [                                        ] | 0% Completed | 24.6s
  [                                        ] | 0% Completed | 24.7s
  [                                        ] | 0% Completed | 24.9s
  [                                        ] | 0% Completed | 25.0s
  [                                        ] | 0% Completed | 25.1s
  [                                        ] | 0% Completed | 25.2s
  [                                        ] | 0% Completed | 25.3s
  [                                        ] | 0% Completed | 25.4s
  [                                        ] | 0% Completed | 25.5s
  [                                        ] | 0% Completed | 25.6s
  [                                        ] | 0% Completed | 25.7s
  [                                        ] | 0% Completed | 25.8s
  [                                        ] | 0% Completed | 26.0s
  [                                        ] | 0% Completed | 26.1s
  [                                        ] | 0% Completed | 26.2s
  [                                        ] | 0% Completed | 26.3s
  [                                        ] | 0% Completed | 26.4s

Command error:

  2021-06-10 20:09:39,736 - pyscenic.utils - INFO - Using existing Pearson correlations from the adjacencies file.

  2021-06-10 20:09:40,036 - pyscenic.utils - INFO - Creating modules.

  2021-06-10 20:15:21,369 - pyscenic.cli.pyscenic - INFO - Loading databases.

  2021-06-10 20:15:21,370 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
  Traceback (most recent call last):
    File "/opt/venv/bin/pyscenic", line 8, in <module>
      sys.exit(main())
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 675, in main
      args.func(args)
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 230, in prune_targets_command
      num_workers=args.num_workers,
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 410, in prune2df
      module_chunksize,
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 334, in _distributed_calc
      scheduler='processes', num_workers=num_workers if num_workers else cpu_count()
    File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 281, in compute
      (result,) = compute(self, traverse=False, **kwargs)
    File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 563, in compute
      results = schedule(dsk, keys, **kwargs)
    File "/opt/venv/lib/python3.7/site-packages/dask/multiprocessing.py", line 228, in get
      **kwargs
    File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 487, in get_async
      raise_exception(exc, tb)
    File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 317, in reraise
      raise exc
    File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 222, in execute_task
      result = _execute_task(task, data)
    File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
      return func(*(_execute_task(a, cache) for a in args))
    File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in <genexpr>
      return func(*(_execute_task(a, cache) for a in args))
    File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
      return func(*(_execute_task(a, cache) for a in args))
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in modules2df
      [module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in <listcomp>
      [module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in module2df
      db, module, motif_annotations, weighted_recovery=weighted_recovery
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 152, in module2features_auc1st_impl
      df = db.load(module)
    File "/opt/venv/lib/python3.7/site-packages/ctxcore/rnkdb.py", line 325, in load
      df.set_index(self._index_name, inplace=True)
    File "/opt/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 4724, in set_index
      raise KeyError(f"None of {missing} are in the columns")
  KeyError: "None of ['features'] are in the columns"

Work dir:
  /work/adufour/scenic_pig/direct_annot/work/d5/80085a433e79cc65af68a03811e6c2

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

executor >  slurm (3)
[84/38573e] process > scenic:SCENIC:ARBORETO_WITH... [100%] 1 of 1 ✔
[02/fe8e56] process > scenic:SCENIC:ADD_PEARSON_C... [100%] 1 of 1 ✔
[d5/80085a] process > scenic:SCENIC:CISTARGET__MO... [100%] 1 of 1, failed: 1 ✘
[-        ] process > scenic:SCENIC:AUCELL__MOTIF    -
[-        ] process > scenic:SCENIC:VISUALIZE        -
[-        ] process > scenic:SCENIC:PUBLISH_LOOM     -
[-        ] process > scenic:PUBLISH_SCENIC:COMPR... -
[-        ] process > scenic:PUBLISH_SCENIC:SC__P... -
Error executing process > 'scenic:SCENIC:CISTARGET__MOTIF (1)'

Caused by:
  Process `scenic:SCENIC:CISTARGET__MOTIF (1)` terminated with an error exit status (1)

Command executed:

  export MKL_NUM_THREADS=1
  export NUMEXPR_NUM_THREADS=1
  export OMP_NUM_THREADS=1
        pyscenic ctx             10x_PBMC__adj.tsv             Sus_scrofa.motifs_vs_regions.rankings.feather Sus_scrofa.motifs_vs_regions.scores.feather                          --annotations_fname motif2tf_orthologuous_bis.tbl             --expression_mtx_fname pigs.loom             --cell_id_attribute CellID             --gene_attribute Gene             --mode "dask_multiprocessing"             --output 10x_PBMC__reg_mtf.csv.gz             --num_workers 4

Command exit status:
  1

Command output:
  [                                        ] | 0% Completed | 21.0s
  [                                        ] | 0% Completed | 21.1s
  [                                        ] | 0% Completed | 21.2s
  [                                        ] | 0% Completed | 21.3s
  [                                        ] | 0% Completed | 21.4s
  [                                        ] | 0% Completed | 21.5s
  [                                        ] | 0% Completed | 21.6s
  [                                        ] | 0% Completed | 21.7s
  [                                        ] | 0% Completed | 21.9s
  [                                        ] | 0% Completed | 22.0s
  [                                        ] | 0% Completed | 22.1s
  [                                        ] | 0% Completed | 22.2s
  [                                        ] | 0% Completed | 22.3s
  [                                        ] | 0% Completed | 22.4s
  [                                        ] | 0% Completed | 22.5s
  [                                        ] | 0% Completed | 22.6s
  [                                        ] | 0% Completed | 22.8s
  [                                        ] | 0% Completed | 22.9s
  [                                        ] | 0% Completed | 23.0s
  [                                        ] | 0% Completed | 23.1s
  [                                        ] | 0% Completed | 23.2s
  [                                        ] | 0% Completed | 23.3s
  [                                        ] | 0% Completed | 23.4s
  [                                        ] | 0% Completed | 23.5s
  [                                        ] | 0% Completed | 23.6s
  [                                        ] | 0% Completed | 23.7s
  [                                        ] | 0% Completed | 23.9s
  [                                        ] | 0% Completed | 24.0s
  [                                        ] | 0% Completed | 24.1s
  [                                        ] | 0% Completed | 24.2s
  [                                        ] | 0% Completed | 24.3s
  [                                        ] | 0% Completed | 24.4s
  [                                        ] | 0% Completed | 24.5s
  [                                        ] | 0% Completed | 24.6s
  [                                        ] | 0% Completed | 24.7s
  [                                        ] | 0% Completed | 24.9s
  [                                        ] | 0% Completed | 25.0s
  [                                        ] | 0% Completed | 25.1s
  [                                        ] | 0% Completed | 25.2s
  [                                        ] | 0% Completed | 25.3s
  [                                        ] | 0% Completed | 25.4s
  [                                        ] | 0% Completed | 25.5s
  [                                        ] | 0% Completed | 25.6s
  [                                        ] | 0% Completed | 25.7s
  [                                        ] | 0% Completed | 25.8s
  [                                        ] | 0% Completed | 26.0s
  [                                        ] | 0% Completed | 26.1s
  [                                        ] | 0% Completed | 26.2s
  [                                        ] | 0% Completed | 26.3s
  [                                        ] | 0% Completed | 26.4s

Command error:

  2021-06-10 20:09:39,736 - pyscenic.utils - INFO - Using existing Pearson correlations from the adjacencies file.

  2021-06-10 20:09:40,036 - pyscenic.utils - INFO - Creating modules.

  2021-06-10 20:15:21,369 - pyscenic.cli.pyscenic - INFO - Loading databases.

  2021-06-10 20:15:21,370 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
  Traceback (most recent call last):
    File "/opt/venv/bin/pyscenic", line 8, in <module>
      sys.exit(main())
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 675, in main
      args.func(args)
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 230, in prune_targets_command
      num_workers=args.num_workers,
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 410, in prune2df
      module_chunksize,
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 334, in _distributed_calc
      scheduler='processes', num_workers=num_workers if num_workers else cpu_count()
    File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 281, in compute
      (result,) = compute(self, traverse=False, **kwargs)
    File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 563, in compute
      results = schedule(dsk, keys, **kwargs)
    File "/opt/venv/lib/python3.7/site-packages/dask/multiprocessing.py", line 228, in get
      **kwargs
    File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 487, in get_async
      raise_exception(exc, tb)
    File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 317, in reraise
      raise exc
    File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 222, in execute_task
      result = _execute_task(task, data)
    File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
      return func(*(_execute_task(a, cache) for a in args))
    File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in <genexpr>
      return func(*(_execute_task(a, cache) for a in args))
    File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
      return func(*(_execute_task(a, cache) for a in args))
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in modules2df
      [module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in <listcomp>
      [module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in module2df
      db, module, motif_annotations, weighted_recovery=weighted_recovery
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 152, in module2features_auc1st_impl
      df = db.load(module)
    File "/opt/venv/lib/python3.7/site-packages/ctxcore/rnkdb.py", line 325, in load
      df.set_index(self._index_name, inplace=True)
    File "/opt/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 4724, in set_index
      raise KeyError(f"None of {missing} are in the columns")
  KeyError: "None of ['features'] are in the columns"

Work dir:
  /work/adufour/scenic_pig/direct_annot/work/d5/80085a433e79cc65af68a03811e6c2

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line```
ghuls commented 3 years ago

@Goultard59 It looks to me that you are using the wrong database: /home/adufour/work/cistargetdb/feather/Sus_scrofa.motifs_vs_regions.*.feather'

Can you list all files in /home/adufour/work/cistargetdb/feather/?

The database file you should use, is: '/home/adufour/work/cistargetdb/feather/Sus_scrofa.regions_vs_motifs.rankings.feather'

Goultard59 commented 3 years ago

Hello,

The output file are : Sus_scrofa.motifs_vs_regions.rankings.feather Sus_scrofa.motifs_vs_regions.scores.feather Sus_scrofa.regions_vs_motifs.rankings.feather Sus_scrofa.regions_vs_motifs.scores.feather

I now get a new error, it's probably caused by an error in the gene name but I don't know in which file i need to investigate.

Thanks for your helps

Caused by:
  Process `scenic:SCENIC:CISTARGET__MOTIF (1)` terminated with an error exit status (1)

Command executed:

  export MKL_NUM_THREADS=1
  export NUMEXPR_NUM_THREADS=1
  export OMP_NUM_THREADS=1
        pyscenic ctx             10x_PBMC__adj.tsv             Sus_scrofa.regions_vs_motifs.rankings.feather                          --annotations_fname motif2tf_orthologuous_bis.tbl             --expression_mtx_fname pigs.loom             --cell_id_attribute CellID             --gene_attribute Gene             --mode "dask_multiprocessing"             --output 10x_PBMC__reg_mtf.csv.gz             --num_workers 4

Command exit status:
  1

Command output:
  [                                        ] | 0% Completed |  1min 45.1s
  [                                        ] | 0% Completed |  1min 45.2s
  [                                        ] | 0% Completed |  1min 45.3s
  [                                        ] | 0% Completed |  1min 45.4s
  [                                        ] | 0% Completed |  1min 45.5s
  [                                        ] | 0% Completed |  1min 45.6s
  [                                        ] | 0% Completed |  1min 45.7s
  [                                        ] | 0% Completed |  1min 45.8s
  [                                        ] | 0% Completed |  1min 45.9s
  [                                        ] | 0% Completed |  1min 46.0s
  [                                        ] | 0% Completed |  1min 46.1s
  [                                        ] | 0% Completed |  1min 46.2s
  [                                        ] | 0% Completed |  1min 46.3s
  [                                        ] | 0% Completed |  1min 46.4s
  [                                        ] | 0% Completed |  1min 46.5s
  [                                        ] | 0% Completed |  1min 46.6s
  [                                        ] | 0% Completed |  1min 46.7s
  [                                        ] | 0% Completed |  1min 46.8s
  [                                        ] | 0% Completed |  1min 46.9s
  [                                        ] | 0% Completed |  1min 47.0s
  [                                        ] | 0% Completed |  1min 47.1s
  [                                        ] | 0% Completed |  1min 47.2s
  [                                        ] | 0% Completed |  1min 47.3s
  [                                        ] | 0% Completed |  1min 47.4s
  [                                        ] | 0% Completed |  1min 47.5s
  [                                        ] | 0% Completed |  1min 47.6s
  [                                        ] | 0% Completed |  1min 47.7s
  [                                        ] | 0% Completed |  1min 47.8s
  [                                        ] | 0% Completed |  1min 47.9s
  [                                        ] | 0% Completed |  1min 48.0s
  [                                        ] | 0% Completed |  1min 48.1s
  [                                        ] | 0% Completed |  1min 48.2s
  [                                        ] | 0% Completed |  1min 48.3s
  [                                        ] | 0% Completed |  1min 48.4s
  [                                        ] | 0% Completed |  1min 48.5s
  [                                        ] | 0% Completed |  1min 48.6s
  [                                        ] | 0% Completed |  1min 48.7s
  [                                        ] | 0% Completed |  1min 48.8s
  [                                        ] | 0% Completed |  1min 48.9s
  [                                        ] | 0% Completed |  1min 49.0s
  [                                        ] | 0% Completed |  1min 49.1s
  [                                        ] | 0% Completed |  1min 49.2s
  [                                        ] | 0% Completed |  1min 49.3s
  [                                        ] | 0% Completed |  1min 49.4s
  [                                        ] | 0% Completed |  1min 49.5s
  [                                        ] | 0% Completed |  1min 49.6s
  [                                        ] | 0% Completed |  1min 49.7s
  [                                        ] | 0% Completed |  1min 49.8s
  [                                        ] | 0% Completed |  1min 49.9s
  [                                        ] | 0% Completed |  1min 50.0s

Command error:

  2021-06-11 23:49:03,128 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for ENSSSCG00000001202 could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.

  2021-06-11 23:49:03,373 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for BBX could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.

  2021-06-11 23:49:03,719 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for GPBP1 could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.

  2021-06-11 23:49:04,061 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for ENSSSCG00000001945 could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.

  2021-06-11 23:49:05,151 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for GRHL1 could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.

  2021-06-11 23:49:05,181 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for ENSSSCG00000002937 could be mapped to Sus_scrofa.regions_vs_motifs.rankings. Skipping this module.
  Traceback (most recent call last):
    File "/opt/venv/bin/pyscenic", line 8, in <module>
      sys.exit(main())
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 675, in main
      args.func(args)
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 230, in prune_targets_command
      num_workers=args.num_workers,
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 410, in prune2df
      module_chunksize,
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 334, in _distributed_calc
      scheduler='processes', num_workers=num_workers if num_workers else cpu_count()
    File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 281, in compute
      (result,) = compute(self, traverse=False, **kwargs)
    File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 563, in compute
      results = schedule(dsk, keys, **kwargs)
    File "/opt/venv/lib/python3.7/site-packages/dask/multiprocessing.py", line 228, in get
      **kwargs
    File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 487, in get_async
      raise_exception(exc, tb)
    File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 317, in reraise
      raise exc
    File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 222, in execute_task
      result = _execute_task(task, data)
    File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
      return func(*(_execute_task(a, cache) for a in args))
    File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in <genexpr>
      return func(*(_execute_task(a, cache) for a in args))
    File "/opt/venv/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
      return func(*(_execute_task(a, cache) for a in args))
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in modules2df
      [module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 301, in <listcomp>
      [module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in module2df
      db, module, motif_annotations, weighted_recovery=weighted_recovery
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 203, in module2features_auc1st_impl
      rccs = rccs[enriched_features_idx, :][annotated_features_idx, :]
  IndexError: boolean index did not match indexed array along dimension 0; dimension is 23 but corresponding boolean dimension is 26

Work dir:
  /work/adufour/scenic_pig/direct_annot/work/ea/10365d55145ca380aa7b73917cd7f9

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
ghuls commented 3 years ago

Does the names in your 10x_PBMC__adj.tsv matrix match with the one in your database?

Goultard59 commented 3 years ago

Hello,

There are some genes that are duplicated (U6 genes for examples) which are duplicates in my loom expression matrix, I don't know how to merge them in loom files format, if i use .var_names_make_unique() in anndata the name U6-1 may not match with my U6 gene in my feather database. If i not make them unique, it leads me to a memory error.

The error message :

executor >  slurm (1)
[a9/edc839] process > scenic:SCENIC:ARBORETO_WITH... [100%] 1 of 1, cached: 1 ✔
[3d/be9715] process > scenic:SCENIC:ADD_PEARSON_C... [100%] 1 of 1, failed: 1 ✘
[-        ] process > scenic:SCENIC:CISTARGET__MOTIF -
[-        ] process > scenic:SCENIC:AUCELL__MOTIF    -
[-        ] process > scenic:SCENIC:VISUALIZE        -
[-        ] process > scenic:SCENIC:PUBLISH_LOOM     -
[-        ] process > scenic:PUBLISH_SCENIC:COMPR... -
[-        ] process > scenic:PUBLISH_SCENIC:SC__P... -
Error executing process > 'scenic:SCENIC:ADD_PEARSON_CORRELATION (1)'

Caused by:
  Process `scenic:SCENIC:ADD_PEARSON_CORRELATION (1)` terminated with an error exit status (1)

Command executed:

  pyscenic add_cor             10x_PBMC__adj.tsv.gz             pigs_join.loom             --output 10x_PBMC__adj.tsv             --cell_id_attribute obs_names             --gene_attribute var_names

Command exit status:
  1

Command output:
  (empty)

Command error:

  2021-06-22 14:39:39,289 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.

  2021-06-22 14:39:49,951 - pyscenic.cli.pyscenic - INFO - Calculating correlations.
  Traceback (most recent call last):
    File "/opt/venv/bin/pyscenic", line 8, in <module>
      sys.exit(main())
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 675, in main
      args.func(args)
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 164, in addCorrelations
      adjacencies_wCor = add_correlation(adjacencies, ex_mtx, rho_threshold=0.03, mask_dropouts=args.mask_dropouts)
    File "/opt/venv/lib/python3.7/site-packages/pyscenic/utils.py", line 144, in add_correlation
      corr_mtx = pd.DataFrame(index=ex_mtx.columns, columns=ex_mtx.columns, data=np.corrcoef(ex_mtx.values.T))
    File "<__array_function__ internals>", line 6, in corrcoef
    File "/opt/venv/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2634, in corrcoef
      c = cov(x, y, rowvar, dtype=dtype)
    File "<__array_function__ internals>", line 6, in cov
    File "/opt/venv/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2492, in cov
      c = dot(X, X_T.conj())
    File "<__array_function__ internals>", line 6, in dot
  numpy.core._exceptions.MemoryError: Unable to allocate 4.74 TiB for an array with shape (806922, 806922) and data type float64

Work dir:
  /work/adufour/scenic_pig/direct_annot/work/3d/be9715468d4ff50ca95fcb5ee68bfc

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
Goultard59 commented 2 years ago

Hi,

I have swetching to Ensembl gene id to prevent memmory error. I have checked correspondance between :

motif name in feather file and annotation file (and conversely) gene name in feather file and expression matrix gene name in annotation files and feather file

But i still get this error messageIndexError: boolean index did not match indexed array along dimension 0; dimension is 235 but corresponding boolean dimension is 240 as described above. I also visually printed the variable : rcc, enriched_features_idx, annotated_features_idx which seems to be in the good format.

Thanks for your helps.

honghh2018 commented 2 years ago

Hi @Goultard59, I am a noviciate for the regulon analysis. I am wonder that how to create the motif.tbl file from the species that you want to. in my case, i want to build the Rat cisTarget database through this repo's script, but the motif.tbl file can not be generated at the same time. Hope your help, in which guiding me to generate the motif.tbl file like below format from Rattus norvegicus? image

Any advice would be appreciated.

Best, Hanhuihong

ghuls commented 2 years ago

@honghh2018 You can use the mouse or human motif table and replace the gene_name column with the homologous gene for rat.

You can get them for example with Ensembl biomart: http://www.ensembl.org/biomart/martview/52da5ebcda00ffbf86e66e48d107b54a?VIRTUALSCHEMANAME=default&ATTRIBUTES=mmusculus_gene_ensembl.default.homologs.external_gene_name|mmusculus_gene_ensembl.default.homologs.rnorvegicus_homolog_associated_gene_name&FILTERS=mmusculus_gene_ensembl.default.filters.with_rnorvegicus_homolog.only&VISIBLEPANEL=resultspanel

Goultard59 commented 2 years ago

As complementary advice, I highly recommend you to double-cross your expression matrix and annotation motif table.

Genes can be duplicates in your genomes (ex : U6 genes) so it might be interesting to prefer ENSEMBL genes id In that case 10x pipeline will append a suffix -2 to each duplicate

You also need to be careful with duplicate couple of motif_id + gene_name. That will lead to the previous error message​.