Closed rebirthjin closed 2 years ago
The first error looks like all the scores from that round of optimization were invalid, causing molpal to calculate 0/0 and raising that error. That’s an edge case we can look at covering in the code, but it’s generally a cause for concern when every objective calculation failed. I’m undecided on how we should handle this in the code.
the second error looks to be a result of self.top_k_avg
being None
at the end of optimization. Again, this is likely due to there being too few valid scores from which to calculate a top-k average. This really should not be the case (reasonably), so I’m curious why so many of your objective evaluations are failing
@davidegraff Thanks for your advice.
For lookup process, the score in CSV file was positive value that re-calculated from docking score. And then, I removed "--minimize" in objective option to apply maximize a optimization. So the parameter setting could make that objective calculation failed?
I am trying to run the process that change a negative score and add "--minimize" option. The process would be complete without error. I notice you again.
Have a good day!
Are you sure that your lookup objective is being constructed properly?
@davidegraff What do I check a lookup objective for proper construction?
smiles,score
C[C@@]1(c2ccccc2)OCCO[C@H]1C(=O)O,-3.961000
Cc1ncn(C[C@H]2CC(C)(C)CO2)c1C,-4.435000
CC(=O)N1C[C@H]2CNC[C@@]2(C(=O)N(C)Cc2ccoc2)C1,-5.111000
O[C@H]1C[C@@H]2CCCN(C1)C2,-4.209000
CNC(=O)c1cccc(Nc2nc(O)nc(O)c2C#N)c1,-5.455000
Cc1nc(CNc2ccc(F)c(N3CCCS3(=O)=O)c2)cs1,-6.235000
Cc1c(/N=N/c2cccc(C)c2C)c(-c2ccccc2)nn1C(=S)S,-6.337000
OCc1cc(-c2ccc(Cl)c(Cl)c2)ccn1,-3.907000
Cc1noc(C)c1COC(=O)c1ccc(Cl)cc1N1CCCC1=O,-5.407000
O=C(O)c1ccccc1S(=O)(=O)n1ccc(=O)[nH]c1=O,-5.967000
CC(C)CC(=O)NC[C@@]12CNC[C@@H]1COC2,-4.212000
As I changed docking score in csv into all of negative values with "--minimize", the process was finished completely. But I got maximum positive value in all_explopred_final.csv
Is it wrong parameter for objective option?
MolPAL will be run with the following arguments:
batch_sizes: [0.01]
budget: 1.0
cache: False
checkpoint_file: None
chkpt_freq: 0
cluster: False
conf_method: mve
config: njkoo_config.ini
cxsmiles: False
ddp: False
delimiter: ,
delta: 0.1
epsilon: 0.0
final_lr: 0.0001
fingerprint: pair
fps: /home/njgoo/Data1/program/molpal/libraries/ZINC20_Stock.h5
init_lr: 0.0001
init_size: 0.01
invalid_idxs: []
k: 0.0005
length: 2048
libraries: ['/home/njgoo/Data1/program/molpal/libraries/ZINC20_Stock.csv.gz']
max_iters: 50
max_lr: 0.001
metric: random
minimize: True
model: mpn
model_seed: None
ncpu: 20
objective: lookup
objective_config: njkoo_lookup.ini
output_dir: molpal_stock
pool: eager
precision: 32
previous_scores: None
radius: 2
retrain_from_scratch: True
scores_csvs: None
seed: None
smiles_col: 0
test_batch_size: None
title_line: True
verbose: 0
window_size: 10
write_final: True
write_intermediate: True
i would just add in a print
statement to see what sort of values you're getting out of objective.calc(...)
. If all of the values failed, then there's an issue with how you're constructing your MolPAL run
I also noticed that my output files are filled with the positive scores while my lookup file has negative scores (more negative = better compound). I think the sign just got swapped during processing. It explored compounds with more negative score progressively so I think it was doing what it supposed to
The output files always use positive scores, regardless of the input lookup file
@davidegraff Would we change a positive score of output into negative score? Because of docking score of total energy, more negative values indicate better compounds.
And, what file do I add for print statement? I don't find function of objective.calc() Thanks for quick response!
Yes. The framing of MolPAL is a maximization problem. So the output reflects that by the most positive output being the best
I wonder what is meaning of --minimize option
.
Before of your comment, I understood that run with 'minimze' option can get more negative score and run without 'minimize' option could get more positive score.
In docking, a more negative score is better, so you want to —minimize
It. Unless of course you were trying to find the worst possible binder for your target of interest, in which case you would want to maximize
it (the default assumption.) To perform a minimization, we multiply objective values by -1 under the hood, so that the rest of the program sees a maximization. You see the result of this multiplication in the output.
Thanks for kind explain! As your mention, I understood the same meaning. However, I'm confused to interpret the output result in all_explored_final.csv.
I got positive value from run with --minimize option
, but negative value from run without '--minimize option'.
The result was not same as your mention "The output files always use positive scores, regardless of the input lookup file" Also, it generated the opposite results that I expect.
Could you check the code for calculation of multiply objective values? I just change the out value into new value by -1.
I found a different default value for objective.
minimize: bool = False in base.py minimize: bool = True in lookup.py
Have a good day!
I misspoke earlier. The values in the output are not always positive, but they are always reflective of more positive values being "better" in MolPALs view. I.e., if you --minimize
your objective, then the true objective values in the output should be multiplied by -1
. If you maximize, then you may take the output scores as-is. The different default values are overridden by the supplied minimize
value from the arguments, which is False
by default.
Thank you very much!
Hello, I am trying to run my data. While the program is running, I got the problem of two cases.
First error is below,
I think that valid_scores could be empty so len(valid_scores) is zero.
Second error is below,
I wonder that the my job is raised to wrong process.
What do I check a intermediate data for fixing bug?
Thank you