SigProfilerExtractor allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.
I'm having issues trying to run SigProfiler for CNV data using custom CNV calls. We have an in-house pipeline for CNV calls that is similar to the ASCAT approach, tailored to our WGS data. I tried to match the format of my calls to what ASCAT_NGS includes in their documentation ([https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6097604/]).
The columns I provided are (in this order): 'sample', 'segment_number', 'chromosome', 'start_position', 'end_position', 'major_normal', 'minor_normal', 'major_tumor', 'minor_tumor'. This would correspond to what ASCAT_NGS says is the output format for the 'copynumber.caveman.csv' file.
I'm running Sigprofiler as follows:
from SigProfilerExtractor import sigpro as sig
def main_function():
segment_file = "/home/kbrar/MOCHA_Jun_2024/somatic/cnv/adjcopies_segments/ascat_format_output.csv"
sig.sigProfilerExtractor("seg:ASCAT_NGS", "/home/kbrar/MOCHA_Jun_2024/somatic/cnv/sigprofiler_CNV_Jul2024", segment_file, reference_genome="GRCh38", opportunity_genome="GRCh38")
if __name__=="__main__":
main_function()
However, when I run sigprofiler as above, I get a KeyError. If I try to add the "Tumour TCN", "Normal BCN", and "Tumour BCN" columns as the errors indicate the tool is asking for, I get a KeyError with 'sample', which is definitely a column in the input CSV file. I've pasted that below.
Would you be able to clarify the exact input format needed for "ASCAT_NGS" type of input, or the specific columns required for any of the input types? I am able to adjust our input to whichever columns SigProfiler expects, but it is just unclear what exact columns should be provided for each respective input. If this could be clarified, I would be able to adjust my input column names to what is expected. Thanks!!
** Reported Current Memory Use: 0.5 GB *****
Traceback (most recent call last):
File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'sample'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/kbrar/python_shell_scripts/sigprofiler_cnv_run.py", line 10, in
main_function()
File "/home/kbrar/python_shell_scripts/sigprofiler_cnv_run.py", line 7, in main_function
sig.sigProfilerExtractor(input_type="seg:ASCAT_NGS", output="/home/kbrar/MOCHA_Jun_2024/somatic/cnv/sigprofiler_CNV_Jul2024", input_data=segment_file, reference_genome="GRCh38", opportunity_genome="GRCh38")
File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/SigProfilerExtractor/sigpro.py", line 680, in sigProfilerExtractor
genomes = scna.generateCNVMatrix(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/SigProfilerMatrixGenerator/scripts/CNVMatrixGenerator.py", line 466, in generateCNVMatrix
nmf_matrix, annotated_df = annotateSegFile(df, file_type, project, output_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/SigProfilerMatrixGenerator/scripts/CNVMatrixGenerator.py", line 72, in annotateSegFile
columns = list(df["sample"].unique())
~~^^^^^^^^^^
File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/pandas/core/frame.py", line 3807, in getitem
indexer = self.columns.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 'sample'
Hi,
I'm having issues trying to run SigProfiler for CNV data using custom CNV calls. We have an in-house pipeline for CNV calls that is similar to the ASCAT approach, tailored to our WGS data. I tried to match the format of my calls to what ASCAT_NGS includes in their documentation ([https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6097604/]).
The columns I provided are (in this order): 'sample', 'segment_number', 'chromosome', 'start_position', 'end_position', 'major_normal', 'minor_normal', 'major_tumor', 'minor_tumor'. This would correspond to what ASCAT_NGS says is the output format for the 'copynumber.caveman.csv' file.
I'm running Sigprofiler as follows:
However, when I run sigprofiler as above, I get a KeyError. If I try to add the "Tumour TCN", "Normal BCN", and "Tumour BCN" columns as the errors indicate the tool is asking for, I get a KeyError with 'sample', which is definitely a column in the input CSV file. I've pasted that below.
Would you be able to clarify the exact input format needed for "ASCAT_NGS" type of input, or the specific columns required for any of the input types? I am able to adjust our input to whichever columns SigProfiler expects, but it is just unclear what exact columns should be provided for each respective input. If this could be clarified, I would be able to adjust my input column names to what is expected. Thanks!!
** Reported Current Memory Use: 0.5 GB *****
Traceback (most recent call last): File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc return self._engine.get_loc(casted_key) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'sample'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/kbrar/python_shell_scripts/sigprofiler_cnv_run.py", line 10, in
main_function()
File "/home/kbrar/python_shell_scripts/sigprofiler_cnv_run.py", line 7, in main_function
sig.sigProfilerExtractor(input_type="seg:ASCAT_NGS", output="/home/kbrar/MOCHA_Jun_2024/somatic/cnv/sigprofiler_CNV_Jul2024", input_data=segment_file, reference_genome="GRCh38", opportunity_genome="GRCh38")
File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/SigProfilerExtractor/sigpro.py", line 680, in sigProfilerExtractor
genomes = scna.generateCNVMatrix(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/SigProfilerMatrixGenerator/scripts/CNVMatrixGenerator.py", line 466, in generateCNVMatrix
nmf_matrix, annotated_df = annotateSegFile(df, file_type, project, output_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/SigProfilerMatrixGenerator/scripts/CNVMatrixGenerator.py", line 72, in annotateSegFile
columns = list(df["sample"].unique())
~~^^^^^^^^^^
File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/pandas/core/frame.py", line 3807, in getitem
indexer = self.columns.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kbrar/miniforge3/envs/sigprofiler/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 'sample'