brain-research / deep-molecular-massspec

Mass Spectrometry for Small Molecules using Deep Learning
Apache License 2.0
110 stars 41 forks source link

Feature Request: trained model #7

Closed curt-f closed 5 years ago

curt-f commented 5 years ago

Is there any chance that the trained model described in https://pubs.acs.org/doi/abs/10.1021/acscentsci.9b00085 (i.e. the full bi-directional NEIMS model described in Table 1 and elsewhere) could be made available in this repo? It would be very fun to play with.

Many thanks for this contribution to the field of small molecule mass spectrometry.

jnwei-zz commented 5 years ago

Hi, please find the trained model now included in the repository.

gpavuluri commented 4 years ago

For training a new model, is there a way to use the checkpoint files when predicting spectra and receive an annotated sdf, similar to how the massspec_weights folder is used?

jnwei-zz commented 4 years ago

Yes it is. In step 2, if you replace the model_dir argument with the directory containing your new trained weights, it should work.

gpavuluri commented 4 years ago

Predicting spectra using make_spectra_prediction.py with massspec_weights runs successfully, but using a new model by replacing the weights_dir parameter with the model_dir parameter from training results in an error shown in predict.txt. The output of the previous steps that worked are in convert.txt and train.txt.

Are all checkpoint files supposed to be 186 mb, even when increasing the steps to 100,000?

Which of the json files in the spectra_tf_records conversion was used to train massspec_weights?

predict_massspec_weights.txt train.txt convert.txt predict.txt

jnwei-zz commented 4 years ago

Hi, the issue is that the SpectraPredictor class was still using the hparams for the fully trained model, set in spectra_predictor.py. In make_spectra_predictions.py lines 39 and 40, if you set the hparams_str arg to be empty, like this:

  predictor = spectra_predictor.NeimsSpectraPredictor(
      model_checkpoint_dir=FLAGS.weights_dir, hparams_str='')

then the default hparams from molecule_predictors.py will be used instead, and the module should run.

I believe the default behavior is to save all of the checkpoint files, each of which is storing all of the weights for the entire graph. I think there is a setting to change the checkpointing frequency; I would have to look for it.