aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
439 stars 181 forks source link

ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description'] #533

Closed jnmaciuch closed 7 months ago

jnmaciuch commented 7 months ago

Hello,

I keep running into the following error when I try to run the ctx command after completing the GRNboost step.

Most recently I encountered this issue using the following two files as f_db_names: '/projects/p31982/Reference_files/pySCENIC/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather /projects/p31982/Reference_files/pySCENIC/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'

and 'mm_mgi_tfs.txt' as MM_TFS_FNAME created from https://resources.aertslab.org/cistarget/motif2tf/motifs-v9-nr.mgi-m0.001-o0.0.tbl. However, I have encountered the same error no matter which versions of the mouse data bases and motif names I use. I have tried the .feather files located at /mm10/refseq_r80/mc9nr/ and /mm9/refseq_r45/mc9nr/ on your resources page.

Steps to reproduce the behavior Command run when the error occurred:

!pyscenic ctx adj.csv \
    {f_db_names} \
    --annotations_fname {MM_TFS_FNAME} \
    --expression_mtx_fname {converted_seurat_object.h5ad} \
    --output reg.csv \
    --mask_dropouts \
    --num_workers 6

Error encountered:

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

2024-03-18 18:06:57,153 - pyscenic.cli.pyscenic - INFO - Creating modules.

2024-03-18 18:06:58,572 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.

2024-03-18 18:07:00,138 - pyscenic.utils - INFO - Calculating Pearson correlations.

2024-03-18 18:07:00,608 - pyscenic.utils - WARNING - Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI.
    Dropout masking is currently set to [True].

2024-03-18 18:07:29,476 - pyscenic.utils - INFO - Creating modules.

2024-03-18 18:08:27,860 - pyscenic.cli.pyscenic - INFO - Loading databases.

2024-03-18 18:08:28,295 - pyscenic.cli.pyscenic - INFO - Calculating regulons.

2024-03-18 18:08:28,295 - pyscenic.prune - INFO - Using 6 workers.

2024-03-18 18:08:28,295 - pyscenic.prune - INFO - Using 6 workers.

2024-03-18 18:08:29,684 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2): database loaded in memory.

2024-03-18 18:08:29,684 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2): database loaded in memory.
Process mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2):

2024-03-18 18:08:29,762 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): database loaded in memory.

2024-03-18 18:08:29,762 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): database loaded in memory.
Traceback (most recent call last):
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
    self.run()
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
    orthologous_identity_threshold=self.orthologuous_identity_threshold,
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
    df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
    self._validate_usecols_names(usecols, self.orig_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
    f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']
Process mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3):
Traceback (most recent call last):
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
    self.run()
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
    orthologous_identity_threshold=self.orthologuous_identity_threshold,
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
    df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
    self._validate_usecols_names(usecols, self.orig_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
    f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']

2024-03-18 18:08:29,803 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): database loaded in memory.

2024-03-18 18:08:29,803 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): database loaded in memory.
Process mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1):
Traceback (most recent call last):
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
    self.run()
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
    orthologous_identity_threshold=self.orthologuous_identity_threshold,
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
    df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]

  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
    self._validate_usecols_names(usecols, self.orig_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
    f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']

2024-03-18 18:08:30,093 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2): database loaded in memory.

2024-03-18 18:08:30,093 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2): database loaded in memory.
Process mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2):
Traceback (most recent call last):
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
    self.run()
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
    orthologous_identity_threshold=self.orthologuous_identity_threshold,
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
    df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
    self._validate_usecols_names(usecols, self.orig_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
    f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']

2024-03-18 18:08:30,276 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): database loaded in memory.

2024-03-18 18:08:30,276 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): database loaded in memory.
Process mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1):
Traceback (most recent call last):
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
    self.run()
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
    orthologous_identity_threshold=self.orthologuous_identity_threshold,
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
    df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
    self._validate_usecols_names(usecols, self.orig_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
    f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']

2024-03-18 18:08:30,385 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): database loaded in memory.

2024-03-18 18:08:30,385 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): database loaded in memory.
Process mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3):
Traceback (most recent call last):
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
    self.run()
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
    orthologous_identity_threshold=self.orthologuous_identity_threshold,
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
    df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
    self._validate_usecols_names(usecols, self.orig_names)
  File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
    f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']

The output above is always repeated for every filepath in f_db_names. Any help would be greatly appreciated!

Please complete the following information:

EDIT: Changed issue title to fix incorrect error message, as previous error message was generated using motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl instead of motifs-v9-nr.mgi-m0.001-o0.0.tbl. Variations on this error still occur regardless of which motifs file are used.

EDIT: Issue was due to corrupted .tbl file, fixed by downloading with wget instead.

jnmaciuch commented 7 months ago

EDIT: Issue was due to corrupted .tbl file, fixed by downloading with wget instead.