pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description'] #533
I keep running into the following error when I try to run the ctx command after completing the GRNboost step.
Most recently I encountered this issue using the following two files as f_db_names:
'/projects/p31982/Reference_files/pySCENIC/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather /projects/p31982/Reference_files/pySCENIC/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
and 'mm_mgi_tfs.txt' as MM_TFS_FNAME created from https://resources.aertslab.org/cistarget/motif2tf/motifs-v9-nr.mgi-m0.001-o0.0.tbl. However, I have encountered the same error no matter which versions of the mouse data bases and motif names I use. I have tried the .feather files located at /mm10/refseq_r80/mc9nr/ and /mm9/refseq_r45/mc9nr/ on your resources page.
Steps to reproduce the behavior
Command run when the error occurred:
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
2024-03-18 18:06:57,153 - pyscenic.cli.pyscenic - INFO - Creating modules.
2024-03-18 18:06:58,572 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.
2024-03-18 18:07:00,138 - pyscenic.utils - INFO - Calculating Pearson correlations.
2024-03-18 18:07:00,608 - pyscenic.utils - WARNING - Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI.
Dropout masking is currently set to [True].
2024-03-18 18:07:29,476 - pyscenic.utils - INFO - Creating modules.
2024-03-18 18:08:27,860 - pyscenic.cli.pyscenic - INFO - Loading databases.
2024-03-18 18:08:28,295 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
2024-03-18 18:08:28,295 - pyscenic.prune - INFO - Using 6 workers.
2024-03-18 18:08:28,295 - pyscenic.prune - INFO - Using 6 workers.
2024-03-18 18:08:29,684 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2): database loaded in memory.
2024-03-18 18:08:29,684 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2): database loaded in memory.
Process mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2):
2024-03-18 18:08:29,762 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): database loaded in memory.
2024-03-18 18:08:29,762 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): database loaded in memory.
Traceback (most recent call last):
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
self.run()
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
orthologous_identity_threshold=self.orthologuous_identity_threshold,
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
self._engine = self._make_engine(self.engine)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
self._validate_usecols_names(usecols, self.orig_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']
Process mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3):
Traceback (most recent call last):
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
self.run()
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
orthologous_identity_threshold=self.orthologuous_identity_threshold,
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
self._engine = self._make_engine(self.engine)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
self._validate_usecols_names(usecols, self.orig_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']
2024-03-18 18:08:29,803 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): database loaded in memory.
2024-03-18 18:08:29,803 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): database loaded in memory.
Process mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1):
Traceback (most recent call last):
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
self.run()
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
orthologous_identity_threshold=self.orthologuous_identity_threshold,
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
self._engine = self._make_engine(self.engine)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
self._validate_usecols_names(usecols, self.orig_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']
2024-03-18 18:08:30,093 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2): database loaded in memory.
2024-03-18 18:08:30,093 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2): database loaded in memory.
Process mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(2):
Traceback (most recent call last):
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
self.run()
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
orthologous_identity_threshold=self.orthologuous_identity_threshold,
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
self._engine = self._make_engine(self.engine)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
self._validate_usecols_names(usecols, self.orig_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']
2024-03-18 18:08:30,276 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): database loaded in memory.
2024-03-18 18:08:30,276 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): database loaded in memory.
Process mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1):
Traceback (most recent call last):
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
self.run()
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
orthologous_identity_threshold=self.orthologuous_identity_threshold,
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
self._engine = self._make_engine(self.engine)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
self._validate_usecols_names(usecols, self.orig_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']
2024-03-18 18:08:30,385 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): database loaded in memory.
2024-03-18 18:08:30,385 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): database loaded in memory.
Process mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3):
Traceback (most recent call last):
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/multiprocessing_on_dill/process.py", line 254, in _bootstrap
self.run()
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 131, in run
orthologous_identity_threshold=self.orthologuous_identity_threshold,
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 57, in load_motif_annotations
df = pd.read_csv(fname, sep="\t", index_col=[1, 0], usecols=column_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
self._engine = self._make_engine(self.engine)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 135, in __init__
self._validate_usecols_names(usecols, self.orig_names)
File "/projects/p31982/Mac_paper/CD11b+_scRNA/pyscenic/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 867, in _validate_usecols_names
f"Usecols do not match columns, columns expected but not found: "
ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', '#motif_id', 'orthologous_identity', 'gene_name', 'description']
The output above is always repeated for every filepath in f_db_names. Any help would be greatly appreciated!
EDIT: Changed issue title to fix incorrect error message, as previous error message was generated using motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl instead of motifs-v9-nr.mgi-m0.001-o0.0.tbl. Variations on this error still occur regardless of which motifs file are used.
EDIT: Issue was due to corrupted .tbl file, fixed by downloading with wget instead.
Hello,
I keep running into the following error when I try to run the ctx command after completing the GRNboost step.
Most recently I encountered this issue using the following two files as f_db_names:
'/projects/p31982/Reference_files/pySCENIC/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather /projects/p31982/Reference_files/pySCENIC/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
and 'mm_mgi_tfs.txt' as MM_TFS_FNAME created from https://resources.aertslab.org/cistarget/motif2tf/motifs-v9-nr.mgi-m0.001-o0.0.tbl. However, I have encountered the same error no matter which versions of the mouse data bases and motif names I use. I have tried the .feather files located at /mm10/refseq_r80/mc9nr/ and /mm9/refseq_r45/mc9nr/ on your resources page.
Steps to reproduce the behavior Command run when the error occurred:
Error encountered:
The output above is always repeated for every filepath in f_db_names. Any help would be greatly appreciated!
Please complete the following information:
EDIT: Changed issue title to fix incorrect error message, as previous error message was generated using motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl instead of motifs-v9-nr.mgi-m0.001-o0.0.tbl. Variations on this error still occur regardless of which motifs file are used.
EDIT: Issue was due to corrupted .tbl file, fixed by downloading with wget instead.