Q: Do I need to end the training process manually?

connorcoley / rexgen_direct

Template-free prediction of organic reaction outcomes

GNU General Public License v3.0

151 stars 69 forks source link

Q: Do I need to end the training process manually? #5

Open GhostSteven opened 5 years ago

GhostSteven commented 5 years ago

Hi, I'm trying retraining thecore_wln_globalmodel completely followed your notes python nntrain_direct.py --train ../data/train.txt.proc --hidden 300 --depth 3 --save_dir model-300-3-direct | tee model-300-3-direct/log.txt and it has already generated mode.ckpt-220000 durng last 25 hours. But in the paper you said there are only 140,000 minibatches and it should take 19 hours, is there anything wrong with my process? Or you just mean generating a model with 140,000 minibatches needs 19 hours and I should end the training after generating mode.ckpt-140000? Screenshot from 2019-04-21 15-39-52

connorcoley commented 5 years ago

Sorry for the ambiguity -- training won't stop automatically. After manually stopping training, I looked at the validation performance of each of the checkpoints and it leveled off at model.ckpt-140000

GhostSteven commented 5 years ago

Thank you very much. I've already repeated the whole process and now I have other two questions: First, when I ran the validation of ultimate model in rank_diff_wln, i.e , ran nnotest_direct_useScores.py, it could not validate all the 30,000 examples but just about 29900 and got stuck. So I had to close the Terminal and I found the last line of the generated file valid.cbond_detailed_2400000(others are the same) is not completed such as :| 2.0-4.0- . I don't know why it happend and how to resolve this problem; Another question is that, after you ran the last testing of the model and you can get the prediction results whatever it's right or wrong on the website by django, but how can you know the wrong prediction is "near-miss" or "complete-miss" ?

connorcoley commented 5 years ago

I haven't encountered that issue where the last set of examples cannot be validated and the file is incomplete. Is there any chance the process was killed externally?

The "near-miss" and "complete-miss" terms correspond to whether the recorded product was proposed as the second highest-ranked product or whether it was not found in the top 5. That is based on analyzing the results of running eval_by_smiles.py on the predicted bond edits to perform a final comparison of SMILES strings. This script will write a detailed output like the one in rexgen_direct/rank_diff_wln/model-core16-500-3-max150-direct-useScores/test.cbond_detailed_2400000.eval_by_smiles that lists the top 10 predicted SMILES for each example and the rank at which the recorded product is found (if it is found in the top 5; otherwise, it will be 11).