Open mkazanov opened 4 months ago
Hi @mkazanov,
Thanks for reaching out!
You have run the SigProfilerSimulator with contexts=["6"] and the SigProflerClusters tool with context ["96"] which is causing the issue. You need to define the same context for both tools and please define the path with a backslash at the end ("/disk2t/DATA/CLUSTERS/VCF/").
Please let me know if you had any further issues.
Best, Mousumy
Thank you, this time it runs without errors:
python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
>>> sigSim.SigProfilerSimulator("BLCA","/disk2t/DATA/CLUSTERS/VCF","GRCh37",contexts=["6"],chrom_based=True,simulations=100)
======================================
SigProfilerSimulator
======================================
Checking for all reference files and relevant matrices...
Matrices per chromosomes do not exist. Creating the matrix files now.
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 2.48 seconds.
/disk2t/DATA/CLUSTERS/VCF/output/SBS/BLCA.SBS6.all does not exist. Creating the matrix file now.
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 1.3 seconds.
Matrices generated for 1 samples with 0 errors. Total of 41148 SNVs, 77 DINUCs, and 0 INDELs were successfully analyzed.
Files successfully read and mutations collected. Mutation assignment starting now.
Chromosome X done
Chromosome 11 done
Chromosome 15 done
Chromosome 14 done
Chromosome 13 done
Chromosome 9 done
Chromosome 10 done
Chromosome 12 done
Chromosome 8 done
Chromosome 16 done
Chromosome 7 done
Chromosome 6 done
Chromosome 5 done
Chromosome 2 done
Chromosome 3 done
Chromosome 4 done
Chromosome 22 done
Chromosome 21 done
Chromosome 1 done
Chromosome 19 done
Chromosome 20 done
Chromosome 18 done
Chromosome 17 done
Simulation completed
Job took 18.600311040878296 seconds
>>> from SigProfilerClusters import SigProfilerClusters as hp
>>> hp.analysis("BLCA","GRCh37","6",["6"],"/disk2t/DATA/CLUSTERS/VCF",analysis="all",sortSims=True,subClassify=True,correction=True,calculateIMD=True,max_cpu=12,TCGA=True,sanger=False)
======================================
Beginning SigProfilerClusters Analysis
======================================
Calculating mutational distances...Completed!
but, in the output directory there are no clustered
, nonClustered
and plots
folders:
output$ ls -l
total 36
drwxrwxr-x 2 parallels parallels 12288 Jul 11 07:08 DBS
drwxrwxr-x 2 parallels parallels 12288 Jul 11 07:08 SBS
drwxrwxr-x 6 parallels parallels 4096 Jul 11 07:09 simulations
drwxrwxr-x 5 parallels parallels 4096 Jul 11 07:08 vcf_files
drwxrwxr-x 3 parallels parallels 4096 Jul 11 07:09 vcf_files_corrected
Hi @mkazanov,
Apologies for the late response! Could you please share one of your example input files so that I can run at my end? And please share the log files(.err and .out files). Additionally, can you please check if there are any clustered mutations in this output directory("/output/vcf_files_corrected/test_clustered/SNV/test_clustered.txt")
Best, Mousumy
Same issue found. This is what I figured out and the solution:
Number 2 may be the main cause. Becasue in the commands of the github page, 'contexts=["96"]' was written for SigSimulator, and there was not a significant sign of not doing this in SigProfilerClusters, which I suggest authors making some notes in this github README.
Hi @mkazanov,
Apologies for the late response! Could you please share one of your example input files so that I can run at my end? And please share the log files(.err and .out files). Additionally, can you please check if there are any clustered mutations in this output directory("/output/vcf_files_corrected/test_clustered/SNV/test_clustered.txt")
Best, Mousumy
Sorry for a late reply. Input file: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh38/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
SigProfilerClusters_BLCA_GRCh38_2024-08-15.err.txt SigProfilerClusters_BLCA_GRCh38_2024-08-15.out.txt
Same issue found. This is what I figured out and the solution:
- In line 896 of SigProfilerClusters/hotspot.py, if variable "contexts" is not equal to string "96" or "ID" or "INDEL", the scripts will return a "'matrix_file_suffix' referenced before assignment" error and exit when running the following commands of "exome" checking. So I added "matrix_file_suffix = ‘.{}.’.format(contexts)" before line 896 to avoid that.
- For the input of SigProfilerClusters.analysis, use 'contexts="96"', not 'contexts=["96"]'. SigProfilerClusters will recognize [“96”] as a list type instead of string type, which will cause an issue in the final steps of generating plot.
Number 2 may be the main cause. Becasue in the commands of the github page, 'contexts=["96"]' was written for SigSimulator, and there was not a significant sign of not doing this in SigProfilerClusters, which I suggest authors making some notes in this github README.
Hi @mkazanov,
Thanks for sharing!
If you are running the simulator with contexts=["6"], please use the same simContext for SigProfilerClusters tool. Please run the SigProfilerClusters tool with contexts="96" and simContext="6".
Here is how you can run the tools:
from SigProfilerSimulator import SigProfilerSimulator as sigSim sigSim.SigProfilerSimulator("BRCA", "/BRCA_example/", "GRCh37", contexts = ["6"], chrom_based=True, simulations=100)
from SigProfilerClusters import SigProfilerClusters as hp hp.analysis("BRCA", "GRCh37", "96", ["6"], "/BRCA_example/", analysis="all", sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=4, TCGA=True, sanger=False)
Hope that will resolve your problem.
Best, Mousumy
Hi @mkazanov,
Thanks for sharing!
If you are running the simulator with contexts=["6"], please use the same simContext for SigProfilerClusters tool. Please run the SigProfilerClusters tool with contexts="96" and simContext="6".
Here is how you can run the tools:
from SigProfilerSimulator import SigProfilerSimulator as sigSim sigSim.SigProfilerSimulator("BRCA", "/BRCA_example/", "GRCh37", contexts = ["6"], chrom_based=True, simulations=100)
from SigProfilerClusters import SigProfilerClusters as hp hp.analysis("BRCA", "GRCh37", "96", ["6"], "/BRCA_example/", analysis="all", sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=4, TCGA=True, sanger=False)
Hope that will resolve your problem.
Best, Mousumy
Thank you, it works with context="96"
.
Will the bug with the context="6"
be fixed soon?
I found also that subClassify=False
does not work - it does not generate folders clustered
and nonClustered
. Is this a bug too?
I've also found that input_path
without a trailing slash causes an error. Nice to be fixed too.
Hi @mkazanov,
Glad that it works at your end and thanks for your suggestions! We will work on it.
If you mentioned "subClassify=False" then the tool will not do the sub-classifications. By default it is False and if you set the parameter to True (subClassify=True), it will generate the clustered and nonClustered folders. Please see the wiki page(https://osf.io/qpmzw/wiki/home/) for more details.
Best, Mousumy
Please re-open the issue if you encounter any further problems.
Thanks, Mousumy
Hi @mkazanov,
Glad that it works at your end and thanks for your suggestions! We will work on it.
If you mentioned "subClassify=False" then the tool will not do the sub-classifications. By default it is False and if you set the parameter to True (subClassify=True), it will generate the clustered and nonClustered folders. Please see the wiki page(https://osf.io/qpmzw/wiki/home/) for more details.
Best, Mousumy
I meant in case subClassify=False, I could not find any clustering results in the output folder at all. Could you please fix it?
Please re-open the issue if you encounter any further problems.
Thanks, Mousumy
It seems I don't have permissions to re-open it. Could you please re-open it until the mentioned bugs are fixed?
Hi @mkazanov,
Thanks for reaching out!
If you set the parameter subClassify=False, you will get the clustered and non-clustered mutations in the output folder. Here is the path (for example):
You can use those .txt output file for further analysis. Please let me know if you have any other questions.
Best, Mousumy
Hi @MousumyCSE,
I faced with the same issue too. I installed GRCh37 with genInstall successfully and defined the parameters.
project="melanoma" genome="GRCh37" vcfFiles = "C:/Users/bkurt/Desktop/test/melanoma" sigSim.SigProfilerSimulator(project, vcfFiles, genome, contexts=["96"], simulations=100, chrom_based=True)
After all, I continued to do simulations and successfully completed them. However, when it comes to the clustering it doesn't work properly even though I fixed the code with your responses above. When I tried the first one below, I got the error:
from SigProfilerClusters import SigProfilerClusters as hp
hp.analysis("melanoma", "GRCh37", "96", ["6"], "C:/Users/bkurt/Desktop/test/melanoma/", analysis="all", sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=4, TCGA=True, sanger=False)
======================================
Beginning SigProfilerClusters Analysis
======================================
There are no simulated data present for this project. Please generate simulations before running SigProfilerClusters.
The package can be installed via pip:
$ pip install SigProfilerSimulator
and used within a python3 sessions as follows:
$ python3
>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
>> sigSim.SigProfilerSimulator(project, project_path, genome, contexts=['6144'], simulations=100)
For a complete list of parameters, visit the github repo (https://github.com/AlexandrovLab/SigProfilerSimulator) or the documentation page (https://osf.io/usxjz/wiki/home/)
It also did not work with the contexts="96" and simContext=["96"]. Finally, I tried the code below too and nothing has changed:
>>> hp.analysis("melanoma", "GRCh37", "96", ["6144"], "C:/Users/bkurt/Desktop/test/melanoma/", analysis="all", sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=4, TCGA=True, sanger=False)
======================================
Beginning SigProfilerClusters Analysis
======================================
There are no simulated data present for this project. Please generate simulations before running SigProfilerClusters.
The package can be installed via pip:
$ pip install SigProfilerSimulator
and used within a python3 sessions as follows:
$ python3
>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
>> sigSim.SigProfilerSimulator(project, project_path, genome, contexts=['6144'], simulations=100)
For a complete list of parameters, visit the github repo (https://github.com/AlexandrovLab/SigProfilerSimulator) or the documentation page (https://osf.io/usxjz/wiki/home/)
How to deal with this bug?
Hi @beyza-kurtoglu,
Thanks for reaching out!
My suggestion will be to remove the previous results from the output directory and re-run your samples. Please see the below command to run your example files(please change the input directory):
from SigProfilerSimulator import SigProfilerSimulator as sigSim sigSim.SigProfilerSimulator("BRCA", "/BRCA_example/", "GRCh37", contexts = ["96"], chrom_based=True, simulations=100)
from SigProfilerClusters import SigProfilerClusters as hp hp.analysis("BRCA", "GRCh37", "96", ["96"], "/BRCA_example/", analysis="all", sortSims=True, subClassify=True, correction=True, calculateIMD=True, TCGA=True, sanger=False)
Please make sure the context you are using to run the SigProfilerSimulator, use the same simContext for running SigProfilerClusters pipeline. If the problem continues, kindly send me the log files and your example input.
Best, Mousumy
Thank you for your response @MousumyCSE . However, even though I applied your suggestion, the same error continues. ERR file is completely empty and I attached the log files. After running the simulator,
>>> from SigProfilerClusters import SigProfilerClusters as hp
>>> hp.analysis("melanoma", "GRCh37", "96", ["96"], "C:/Users/bkurt/Desktop/test/melanoma/", analysis="all", sortSims=Tru
e, subClassify=True, correction=True, calculateIMD=True, TCGA=True, sanger=False)
======================================
Beginning SigProfilerClusters Analysis
======================================
There are no simulated data present for this project. Please generate simulations before running SigProfilerClusters.
The package can be installed via pip:
$ pip install SigProfilerSimulator
and used within a python3 sessions as follows:
$ python3
>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
>> sigSim.SigProfilerSimulator(project, project_path, genome, contexts=['6144'], simulations=100)
For a complete list of parameters, visit the github repo (https://github.com/AlexandrovLab/SigProfilerSimulator) or the documentation page (https://osf.io/usxjz/wiki/home/)
SigProfilerClusters_melanoma_GRCh37_2024-10-23err.txt SigProfilerClusters_melanoma_GRCh37_2024-10-23out.txt
Hi @beyza-kurtoglu,
Thanks for sending!
Can you please share one of your input files so that I can reproduce the error at my end?
Best, Mousumy
Hi @beyza-kurtoglu,
Thanks for sharing!
I have run your input files and it works at my end. Can you please check if you are using the updated tools or not? Could you please create a new conda environment and re-install the necessary SigProfiler tools. This is how I create a new conda environment:
############# conda create -n SPC_new python=3.10 conda activate SPC_new
pip install SigProfilerClusters
Please run the SigProfilerClusters pipeline again and let me know if that works at your end.
Best, Mousumy
Hi @MousumyCSE,
I sent you the wrong VCF files by mistake. Could you please delete them? I will send you the others as soon as possible.
Thanks,
Beyza
Hi @MousumyCSE ,
I created a conda environment with python=3.9 matplotlib=3.4.3 and installed SigProfilerClusters and necessities. The error continues.
(theenv) C:\Users\bkurt>python
Python 3.9.20 (main, Oct 3 2024, 07:38:01) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from SigProfilerMatrixGenerator import install as genInstall
>>> genInstall.install('GRCh37', rsync=False, bash=True)
Tool | Installed
-----------------------
curl | True
wget | False
rsync | False
INFO - Downloading GRCh37...
Downloading: 100.00% [780.58 MB of 780.58 MB] at 2.00 MB/s
Download complete.
INFO - Downloaded GRCh37 from alexandrovlab using FTP.
INFO - GRCh37 has been successfully installed.
All reference files have been created.
To proceed with matrix_generation, please provide the path to your vcf files and an appropriate output path.
Installation complete.
>>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
>>> from SigProfilerClusters import SigProfilerClusters as hp
>>> hp.analysis("melanoma", "GRCh37", "96", ["96"],"C:/Users/bkurt/Desktop/test/melanoma/", analysis="all", sortSims=Tru
e, subClassify=True, correction=True, calculateIMD=True, max_cpu= 8, TCGA=True, sanger=False)
======================================
Beginning SigProfilerClusters Analysis
======================================
There are no simulated data present for this project. Please generate simulations before running SigProfilerClusters.
The package can be installed via pip:
$ pip install SigProfilerSimulator
and used within a python3 sessions as follows:
$ python3
>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
>> sigSim.SigProfilerSimulator(project, project_path, genome, contexts=['6144'], simulations=100)
For a complete list of parameters, visit the github repo (https://github.com/AlexandrovLab/SigProfilerSimulator) or the documentation page (https://osf.io/usxjz/wiki/home/)
Hi @beyza-kurtoglu,
Thanks for the details.
From the above screenshot, it does not look like you have run the SigProfilerSimulator tool(please see the screenshot). Or you have the previous results? Can you please remove your old results and re-run?
Could you please share one of your example files and also the log file for both SigProfilerSimulator and SigProfilerClusters?
For now, can you please run the example file that we have in our wiki page to check if it works at your end or not.
Best, Mousumy