AlexandrovLab / SigProfilerAssignment

Assignment of known mutational signatures to individual samples and individual somatic mutations
BSD 2-Clause "Simplified" License
46 stars 10 forks source link

Problems of uploading segmentation file #146

Open xiw588 opened 1 week ago

xiw588 commented 1 week ago

Hi, I am trying to upload a segmentation file with the following columns under the selection of TCGA on the website https://cancer.sanger.ac.uk/signatures/assignment/app/. But it failed all the time, any idea of what's happening here? Thanks

Sample Chromosome Start End Num_Probes Segment_Mean

xiw588 commented 5 days ago

I changed the column name to 'sample' but still failed with the following error message ValueError: Length of values (0) does not match length of index (491)

Detailed traceback: File "", line 2, in File "/opt/conda/envs/spa_test_MAY2022/lib/python3.8/site-packages/SigProfilerMatrixGenerator/scripts/CNVMatrixGenerator.py", line 466, in generateCNVMatrix nmf_matrix, annotated_df = annotateSegFile(df, file_type, project, output_path) File "/opt/conda/envs/spa_test_MAY2022/lib/python3.8/site-packages/SigProfilerMatrixGenerator/scripts/CNVMatrixGenerator.py", line 191, in annotateSegFile df["CN_class"] = CN_class File "/opt/conda/envs/spa_test_MAY2022/lib/python3.8/site-packages/pandas/core/frame.py", line 3980, in setitem self._set_item(key, value) File "/opt/conda/envs/spa_test_MAY2022/lib/python3.8/site-packages/pandas/core/frame.py", line 4174, in _set_item value = self._sanitize_column(value) File "/opt/conda/envs/spa_test_MAY2022/lib/python3.8/site-packages/pandas/core/frame.py", line 4915, in _sanitize_column com.require_length_match(value, self.index) File "/opt/conda/envs/spa_test_MAY2022/lib/python3.8/site-packages/pandas/core/common.py", line 571, in require_length_match raise ValueError(

mdbarnesUCSD commented 3 days ago

Hi @xiw588,

Could you please let us know what type of segmentation file that you are providing as input? We currently support the following: ASCAT, ASCAT_NGS, SEQUENZA, ABSOLUTE, BATTENBERG, FACETS, PURPLE, TCGA.

xiw588 commented 3 days ago

Hi I think I am using the TCGA format. Here I am attaching an example test.txt

azhark2 commented 3 days ago

Hi, we currently do not support the 'TCGA' explicitly because the TCGA CNV files were generated using ASCAT (SNP6 array). Apologies for the confusion, we will change our documentation accordingly.

Note that the example file you showed is the raw data from the SNP6 array, and you should be using the processed data. This data can be obtained from the GDC data portal. You can find more information about the files types and pipeline here:

https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/CNV_Pipeline/

Note that tool requires allele specific copy number calls (the file you provided is not allele-specific).

xiw588 commented 3 days ago

Hi, I am not using the data from TCGA, and it is just seem to be in the TCGA format. Do you know if there is any methods to convert ?