AlexandrovLab / SigProfilerClusters

Tool for analyzing the inter-mutational distances between SNV-SNV and INDEL-INDEL mutations. Tool separates mutations into clustered and non-clustered groups on a sample-dependent basis.
BSD 2-Clause "Simplified" License
11 stars 1 forks source link

Error: There are no simulated data present for this project #27

Closed mkazanov closed 3 weeks ago

mkazanov commented 5 months ago
$ python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
>>> sigSim.SigProfilerSimulator("BLCA","/disk2t/DATA/CLUSTERS/VCF","GRCh37",contexts=["6"],chrom_based=True,simulations=100)

======================================
        SigProfilerSimulator        
======================================

Checking for all reference files and relevant matrices...
     Matrices per chromosomes do not exist. Creating the matrix files now.
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 2.08 seconds.
     /disk2t/DATA/CLUSTERS/VCF/output/SBS/BLCA.SBS6.all does not exist. Creating the matrix file now.
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 1.28 seconds.
Matrices generated for 1 samples with 0 errors. Total of 41148 SNVs, 77 DINUCs, and 0 INDELs were successfully analyzed.

Files successfully read and mutations collected. Mutation assignment starting now.
         Chromosome X done
         Chromosome 11 done
         Chromosome 15 done
         Chromosome 14 done
         Chromosome 13 done
         Chromosome 9 done
         Chromosome 12 done
         Chromosome 10 done
         Chromosome 8 done
         Chromosome 16 done
         Chromosome 7 done
         Chromosome 6 done
         Chromosome 5 done
         Chromosome 3 done
         Chromosome 2 done
         Chromosome 4 done
         Chromosome 22 done
         Chromosome 21 done
         Chromosome 19 done
         Chromosome 1 done
         Chromosome 20 done
         Chromosome 18 done
         Chromosome 17 done
Simulation completed
Job took  15.419569969177246  seconds
>>>from SigProfilerClusters import SigProfilerClusters as hp
>>> hp.analysis("BLCA","GRCh37","96",["96"],"/disk2t/DATA/CLUSTERS/VCF",analysis="all",sortSims=True,subClassify=True,correction=True,calculateIMD=True,max_cpu=12,TCGA=True,sanger=False)

======================================
Beginning SigProfilerClusters Analysis
======================================

There are no simulated data present for this project. Please generate simulations before running SigProfilerClusters.
    The package can be installed via pip:
            $ pip install SigProfilerSimulator

    and used within a python3 sessions as follows:
            $ python3
            >> from SigProfilerSimulator import SigProfilerSimulator as sigSim
            >> sigSim.SigProfilerSimulator(project, project_path, genome, contexts=['6144'], simulations=100)

    For a complete list of parameters, visit the github repo (https://github.com/AlexandrovLab/SigProfilerSimulator) or the documentation page (https://osf.io/usxjz/wiki/home/)
MousumyCSE commented 4 months ago

Hi @mkazanov,

Thanks for reaching out!

You have run the SigProfilerSimulator with contexts=["6"] and the SigProflerClusters tool with context ["96"] which is causing the issue. You need to define the same context for both tools and please define the path with a backslash at the end ("/disk2t/DATA/CLUSTERS/VCF/").

Please let me know if you had any further issues.

Best, Mousumy

mkazanov commented 4 months ago

Thank you, this time it runs without errors:

python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
>>> sigSim.SigProfilerSimulator("BLCA","/disk2t/DATA/CLUSTERS/VCF","GRCh37",contexts=["6"],chrom_based=True,simulations=100)

======================================
        SigProfilerSimulator        
======================================

Checking for all reference files and relevant matrices...
     Matrices per chromosomes do not exist. Creating the matrix files now.
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 2.48 seconds.
     /disk2t/DATA/CLUSTERS/VCF/output/SBS/BLCA.SBS6.all does not exist. Creating the matrix file now.
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 1.3 seconds.
Matrices generated for 1 samples with 0 errors. Total of 41148 SNVs, 77 DINUCs, and 0 INDELs were successfully analyzed.

Files successfully read and mutations collected. Mutation assignment starting now.
         Chromosome X done
         Chromosome 11 done
         Chromosome 15 done
         Chromosome 14 done
         Chromosome 13 done
         Chromosome 9 done
         Chromosome 10 done
         Chromosome 12 done
         Chromosome 8 done
         Chromosome 16 done
         Chromosome 7 done
         Chromosome 6 done
         Chromosome 5 done
         Chromosome 2 done
         Chromosome 3 done
         Chromosome 4 done
         Chromosome 22 done
         Chromosome 21 done
         Chromosome 1 done
         Chromosome 19 done
         Chromosome 20 done
         Chromosome 18 done
         Chromosome 17 done
Simulation completed
Job took  18.600311040878296  seconds
>>> from SigProfilerClusters import SigProfilerClusters as hp
>>> hp.analysis("BLCA","GRCh37","6",["6"],"/disk2t/DATA/CLUSTERS/VCF",analysis="all",sortSims=True,subClassify=True,correction=True,calculateIMD=True,max_cpu=12,TCGA=True,sanger=False)

======================================
Beginning SigProfilerClusters Analysis
======================================

Calculating mutational distances...Completed!

but, in the output directory there are no clustered, nonClustered and plots folders:

output$ ls -l
total 36
drwxrwxr-x 2 parallels parallels 12288 Jul 11 07:08 DBS
drwxrwxr-x 2 parallels parallels 12288 Jul 11 07:08 SBS
drwxrwxr-x 6 parallels parallels  4096 Jul 11 07:09 simulations
drwxrwxr-x 5 parallels parallels  4096 Jul 11 07:08 vcf_files
drwxrwxr-x 3 parallels parallels  4096 Jul 11 07:09 vcf_files_corrected
MousumyCSE commented 4 months ago

Hi @mkazanov,

Apologies for the late response! Could you please share one of your example input files so that I can run at my end? And please share the log files(.err and .out files). Additionally, can you please check if there are any clustered mutations in this output directory("/output/vcf_files_corrected/test_clustered/SNV/test_clustered.txt")

Best, Mousumy

blastchinchillas commented 3 months ago

Same issue found. This is what I figured out and the solution:

  1. In line 896 of SigProfilerClusters/hotspot.py, if variable "contexts" is not equal to string "96" or "ID" or "INDEL", the scripts will return a "'matrix_file_suffix' referenced before assignment" error and exit when running the following commands of "exome" checking. So I added "matrix_file_suffix = ‘.{}.’.format(contexts)" before line 896 to avoid that.
  2. For the input of SigProfilerClusters.analysis, use 'contexts="96"', not 'contexts=["96"]'. SigProfilerClusters will recognize [“96”] as a list type instead of string type, which will cause an issue in the final steps of generating plot.

Number 2 may be the main cause. Becasue in the commands of the github page, 'contexts=["96"]' was written for SigSimulator, and there was not a significant sign of not doing this in SigProfilerClusters, which I suggest authors making some notes in this github README.

mkazanov commented 3 months ago

Hi @mkazanov,

Apologies for the late response! Could you please share one of your example input files so that I can run at my end? And please share the log files(.err and .out files). Additionally, can you please check if there are any clustered mutations in this output directory("/output/vcf_files_corrected/test_clustered/SNV/test_clustered.txt")

Best, Mousumy

Sorry for a late reply. Input file: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh38/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz

SigProfilerClusters_BLCA_GRCh38_2024-08-15.err.txt SigProfilerClusters_BLCA_GRCh38_2024-08-15.out.txt

mkazanov commented 3 months ago

Same issue found. This is what I figured out and the solution:

  1. In line 896 of SigProfilerClusters/hotspot.py, if variable "contexts" is not equal to string "96" or "ID" or "INDEL", the scripts will return a "'matrix_file_suffix' referenced before assignment" error and exit when running the following commands of "exome" checking. So I added "matrix_file_suffix = ‘.{}.’.format(contexts)" before line 896 to avoid that.
  2. For the input of SigProfilerClusters.analysis, use 'contexts="96"', not 'contexts=["96"]'. SigProfilerClusters will recognize [“96”] as a list type instead of string type, which will cause an issue in the final steps of generating plot.

Number 2 may be the main cause. Becasue in the commands of the github page, 'contexts=["96"]' was written for SigSimulator, and there was not a significant sign of not doing this in SigProfilerClusters, which I suggest authors making some notes in this github README.

2 does not work for me

1 - thanks, but I prefer not to edit the source code myself and would rather the authors do it

MousumyCSE commented 3 months ago

Hi @mkazanov,

Thanks for sharing!

If you are running the simulator with contexts=["6"], please use the same simContext for SigProfilerClusters tool. Please run the SigProfilerClusters tool with contexts="96" and simContext="6".

Here is how you can run the tools:

from SigProfilerSimulator import SigProfilerSimulator as sigSim sigSim.SigProfilerSimulator("BRCA", "/BRCA_example/", "GRCh37", contexts = ["6"], chrom_based=True, simulations=100)

from SigProfilerClusters import SigProfilerClusters as hp hp.analysis("BRCA", "GRCh37", "96", ["6"], "/BRCA_example/", analysis="all", sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=4, TCGA=True, sanger=False)

Hope that will resolve your problem.

Best, Mousumy

mkazanov commented 3 months ago

Hi @mkazanov,

Thanks for sharing!

If you are running the simulator with contexts=["6"], please use the same simContext for SigProfilerClusters tool. Please run the SigProfilerClusters tool with contexts="96" and simContext="6".

Here is how you can run the tools:

from SigProfilerSimulator import SigProfilerSimulator as sigSim sigSim.SigProfilerSimulator("BRCA", "/BRCA_example/", "GRCh37", contexts = ["6"], chrom_based=True, simulations=100)

from SigProfilerClusters import SigProfilerClusters as hp hp.analysis("BRCA", "GRCh37", "96", ["6"], "/BRCA_example/", analysis="all", sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=4, TCGA=True, sanger=False)

Hope that will resolve your problem.

Best, Mousumy

Thank you, it works with context="96".

Will the bug with the context="6" be fixed soon?

I found also that subClassify=False does not work - it does not generate folders clustered and nonClustered. Is this a bug too?

I've also found that input_path without a trailing slash causes an error. Nice to be fixed too.

MousumyCSE commented 2 months ago

Hi @mkazanov,

Glad that it works at your end and thanks for your suggestions! We will work on it.

If you mentioned "subClassify=False" then the tool will not do the sub-classifications. By default it is False and if you set the parameter to True (subClassify=True), it will generate the clustered and nonClustered folders. Please see the wiki page(https://osf.io/qpmzw/wiki/home/) for more details.

Best, Mousumy

MousumyCSE commented 2 months ago

Please re-open the issue if you encounter any further problems.

Thanks, Mousumy

mkazanov commented 2 months ago

Hi @mkazanov,

Glad that it works at your end and thanks for your suggestions! We will work on it.

If you mentioned "subClassify=False" then the tool will not do the sub-classifications. By default it is False and if you set the parameter to True (subClassify=True), it will generate the clustered and nonClustered folders. Please see the wiki page(https://osf.io/qpmzw/wiki/home/) for more details.

Best, Mousumy

I meant in case subClassify=False, I could not find any clustering results in the output folder at all. Could you please fix it?

mkazanov commented 2 months ago

Please re-open the issue if you encounter any further problems.

Thanks, Mousumy

It seems I don't have permissions to re-open it. Could you please re-open it until the mentioned bugs are fixed?

MousumyCSE commented 1 month ago

Hi @mkazanov,

Thanks for reaching out!

If you set the parameter subClassify=False, you will get the clustered and non-clustered mutations in the output folder. Here is the path (for example):

  1. Clustered mutations: "/output/vcf_files_corrected/test_clustered/SNV/test_clustered.txt"
  2. non-clustered mutations: "/output/vcf_files_corrected/test_nonClustered/SNV/test_nonClustered.txt"

You can use those .txt output file for further analysis. Please let me know if you have any other questions.

Best, Mousumy

beyza-kurtoglu commented 1 month ago

Hi @MousumyCSE,

I faced with the same issue too. I installed GRCh37 with genInstall successfully and defined the parameters.

project="melanoma" genome="GRCh37" vcfFiles = "C:/Users/bkurt/Desktop/test/melanoma" sigSim.SigProfilerSimulator(project, vcfFiles, genome, contexts=["96"], simulations=100, chrom_based=True)

After all, I continued to do simulations and successfully completed them. However, when it comes to the clustering it doesn't work properly even though I fixed the code with your responses above. When I tried the first one below, I got the error:

from SigProfilerClusters import SigProfilerClusters as hp 
hp.analysis("melanoma", "GRCh37", "96", ["6"], "C:/Users/bkurt/Desktop/test/melanoma/", analysis="all", sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=4, TCGA=True, sanger=False)
======================================
Beginning SigProfilerClusters Analysis
======================================
There are no simulated data present for this project. Please generate simulations before running SigProfilerClusters.
        The package can be installed via pip:
                        $ pip install SigProfilerSimulator

        and used within a python3 sessions as follows:
                        $ python3
                        >> from SigProfilerSimulator import SigProfilerSimulator as sigSim
                        >> sigSim.SigProfilerSimulator(project, project_path, genome, contexts=['6144'], simulations=100)
        For a complete list of parameters, visit the github repo (https://github.com/AlexandrovLab/SigProfilerSimulator) or the documentation page (https://osf.io/usxjz/wiki/home/)

It also did not work with the contexts="96" and simContext=["96"]. Finally, I tried the code below too and nothing has changed:

>>> hp.analysis("melanoma", "GRCh37", "96", ["6144"], "C:/Users/bkurt/Desktop/test/melanoma/", analysis="all", sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=4, TCGA=True, sanger=False)

======================================
Beginning SigProfilerClusters Analysis
======================================

There are no simulated data present for this project. Please generate simulations before running SigProfilerClusters.
        The package can be installed via pip:
                        $ pip install SigProfilerSimulator

        and used within a python3 sessions as follows:
                        $ python3
                        >> from SigProfilerSimulator import SigProfilerSimulator as sigSim
                        >> sigSim.SigProfilerSimulator(project, project_path, genome, contexts=['6144'], simulations=100)

        For a complete list of parameters, visit the github repo (https://github.com/AlexandrovLab/SigProfilerSimulator) or the documentation page (https://osf.io/usxjz/wiki/home/)

How to deal with this bug?

MousumyCSE commented 1 month ago

Hi @beyza-kurtoglu,

Thanks for reaching out!

My suggestion will be to remove the previous results from the output directory and re-run your samples. Please see the below command to run your example files(please change the input directory):

from SigProfilerSimulator import SigProfilerSimulator as sigSim sigSim.SigProfilerSimulator("BRCA", "/BRCA_example/", "GRCh37", contexts = ["96"], chrom_based=True, simulations=100)

from SigProfilerClusters import SigProfilerClusters as hp hp.analysis("BRCA", "GRCh37", "96", ["96"], "/BRCA_example/", analysis="all", sortSims=True, subClassify=True, correction=True, calculateIMD=True, TCGA=True, sanger=False)

Please make sure the context you are using to run the SigProfilerSimulator, use the same simContext for running SigProfilerClusters pipeline. If the problem continues, kindly send me the log files and your example input.

Best, Mousumy

beyza-kurtoglu commented 1 month ago

Thank you for your response @MousumyCSE . However, even though I applied your suggestion, the same error continues. ERR file is completely empty and I attached the log files. After running the simulator,

>>> from SigProfilerClusters import SigProfilerClusters as hp
>>> hp.analysis("melanoma", "GRCh37", "96", ["96"], "C:/Users/bkurt/Desktop/test/melanoma/", analysis="all", sortSims=Tru
e, subClassify=True, correction=True, calculateIMD=True, TCGA=True, sanger=False)

======================================
Beginning SigProfilerClusters Analysis
======================================

There are no simulated data present for this project. Please generate simulations before running SigProfilerClusters.
        The package can be installed via pip:
                        $ pip install SigProfilerSimulator

        and used within a python3 sessions as follows:
                        $ python3
                        >> from SigProfilerSimulator import SigProfilerSimulator as sigSim
                        >> sigSim.SigProfilerSimulator(project, project_path, genome, contexts=['6144'], simulations=100)

        For a complete list of parameters, visit the github repo (https://github.com/AlexandrovLab/SigProfilerSimulator) or the documentation page (https://osf.io/usxjz/wiki/home/)

SigProfilerClusters_melanoma_GRCh37_2024-10-23err.txt SigProfilerClusters_melanoma_GRCh37_2024-10-23out.txt

MousumyCSE commented 1 month ago

Hi @beyza-kurtoglu,

Thanks for sending!

Can you please share one of your input files so that I can reproduce the error at my end?

Best, Mousumy

MousumyCSE commented 1 month ago

Hi @beyza-kurtoglu,

Thanks for sharing!

I have run your input files and it works at my end. Can you please check if you are using the updated tools or not? Could you please create a new conda environment and re-install the necessary SigProfiler tools. This is how I create a new conda environment:

############# conda create -n SPC_new python=3.10 conda activate SPC_new

install the tools

pip install SigProfilerClusters

Please run the SigProfilerClusters pipeline again and let me know if that works at your end.

Best, Mousumy

beyza-kurtoglu commented 1 month ago

Hi @MousumyCSE,

I sent you the wrong VCF files by mistake. Could you please delete them? I will send you the others as soon as possible.

Thanks,

Beyza

beyza-kurtoglu commented 1 month ago

Hi @MousumyCSE ,

I created a conda environment with python=3.9 matplotlib=3.4.3 and installed SigProfilerClusters and necessities. The error continues.


(theenv) C:\Users\bkurt>python
Python 3.9.20 (main, Oct  3 2024, 07:38:01) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from SigProfilerMatrixGenerator import install as genInstall
>>> genInstall.install('GRCh37', rsync=False, bash=True)
Tool       | Installed
-----------------------
curl       | True
wget       | False
rsync      | False
INFO - Downloading GRCh37...
Downloading: 100.00% [780.58 MB of 780.58 MB] at 2.00 MB/s
Download complete.
INFO - Downloaded GRCh37 from alexandrovlab using FTP.
INFO - GRCh37 has been successfully installed.
All reference files have been created.
To proceed with matrix_generation, please provide the path to your vcf files and an appropriate output path.
Installation complete.
>>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
>>> from SigProfilerClusters import SigProfilerClusters as hp
>>> hp.analysis("melanoma", "GRCh37", "96", ["96"],"C:/Users/bkurt/Desktop/test/melanoma/", analysis="all", sortSims=Tru
e, subClassify=True, correction=True, calculateIMD=True, max_cpu= 8, TCGA=True, sanger=False)

======================================
Beginning SigProfilerClusters Analysis
======================================

There are no simulated data present for this project. Please generate simulations before running SigProfilerClusters.
        The package can be installed via pip:
                        $ pip install SigProfilerSimulator

        and used within a python3 sessions as follows:
                        $ python3
                        >> from SigProfilerSimulator import SigProfilerSimulator as sigSim
                        >> sigSim.SigProfilerSimulator(project, project_path, genome, contexts=['6144'], simulations=100)

        For a complete list of parameters, visit the github repo (https://github.com/AlexandrovLab/SigProfilerSimulator) or the documentation page (https://osf.io/usxjz/wiki/home/)
MousumyCSE commented 1 month ago

Hi @beyza-kurtoglu,

Thanks for the details.

From the above screenshot, it does not look like you have run the SigProfilerSimulator tool(please see the screenshot). Or you have the previous results? Can you please remove your old results and re-run?

Screenshot 2024-10-24 at 10 55 17 AM

Could you please share one of your example files and also the log file for both SigProfilerSimulator and SigProfilerClusters?

For now, can you please run the example file that we have in our wiki page to check if it works at your end or not.

Best, Mousumy

MousumyCSE commented 3 weeks ago

Hi both,

Hope the above solution helped you to solve your issues. Please re-open the issue if you encounter any problems.

Thanks! Mousumy