Closed Calvinblue closed 4 years ago
This issue has been repaired. The user's value for percent identity specified in the multiphate config file is carried through and used to filter low-identity blastn (gene) and blastp (protein) hits. The genome hits are set at a minimum of 20% and not configurable. The phate_sequenceAnnotation_main.log file now reports blast object settings at the correct time, thus reporting the values that have been set just prior to running blast.
Thank you for this update :)
Hi Carol,
So I had a look in phate_sequenceAnnotation_main.log, and I was wondering why there was such high e values and identity min/select for blastn, as the identity was not the one specified in the .config file and the e values are very high. On the other hand blastp values seemed normal, as you can see in the following extract of the log file
Creating a blast object 2020-01-25 21:38:52.799866 Preparing to run blastp at the following settings: Parameters: blast flavor blastp identity min 50 evalue min 0.01 identity select: 70 evalue select: 0.01 topHitCount: 3 scoreEdge: 0.1 overhang 0.1 outputFormat: 5 Creating an hmm object 2020-01-25 21:38:52.800219 Preparing to run blastn at the following settings: Parameters: blast flavor blastn identity min 20 evalue min 10.0 identity select: 20 evalue select: 10.0 topHitCount: 3 scoreEdge: 0.1 overhang 0.1 outputFormat: 5 2020-01-25 21:38:52.800251 Preparing to run blastn at the following settings: Parameters: blast flavor blastn identity min 20 evalue min 10.0 identity select: 20 evalue select: 10.0 topHitCount: 3 scoreEdge: 0.1 overhang 0.1 outputFormat: 5 Running Blast against phage genome database(s)...
I had a look into phate_sequenceAnnotation_main.py and noticed there where some discrepancies in the way the different flavors of blast are called :
genome blast parameters are :
'identityMin' : GENOME_IDENTITY_MIN, # Should not be hard coded, but should be a low-stringency setting 'identitySelect' : GENOME_IDENTITY_SELECT, # this one too 'evalueMin' : EVALUE_MIN, 'evalueSelect' : EVALUE_SELECT,
gene blast parameters are : myParamSet = { 'identityMin' : blastpIdentity, # Setting as per blastp, for now 'identitySelect' : blastpIdentity, # this one too 'evalueMin' : 10, 'evalueSelect' : 10,
whereas protein blast parameters are :
myParamSet = { 'identityMin' : int(blastpIdentity), 'identitySelect' : int(blastpIdentity), 'evalueMin' : EVALUE_MIN, 'evalueSelect' : EVALUE_SELECT,
So I edited the file by using the same parameters for blastn and blastp, and it worked for the % identity but not for the evalue, as you can see in the following log file
Creating a blast object 2020-01-25 22:46:30.244293 Preparing to run blastp at the following settings: Parameters: blast flavor blastp identity min 50 evalue min 0.01 identity select: 70 evalue select: 0.01 topHitCount: 3 scoreEdge: 0.1 overhang 0.1 outputFormat: 5 Creating an hmm object 2020-01-25 22:46:30.244639 Preparing to run blastn at the following settings: Parameters: blast flavor blastn identity min 60 evalue min 10.0 identity select: 60 evalue select: 10.0 topHitCount: 3 scoreEdge: 0.1 overhang 0.1 outputFormat: 5 2020-01-25 22:46:30.244672 Preparing to run blastn at the following settings: Parameters: blast flavor blastn identity min 60 evalue min 10.0 identity select: 60 evalue select: 10.0 topHitCount: 3 scoreEdge: 0.1 overhang 0.1 outputFormat: 5 Running Blast against phage genome database(s)...
In the end I manually changed the evalue min and select to 0.01 as a temporary solution (which seemed to work). I do not really understand python code but I believe this part of the code is not working as you intended it to work. Could you please give me advice on this ?
Thanks in advance!
Best,
Antoine