MicrobeLab / DeepMicrobes

DeepMicrobes: taxonomic classification for metagenomics with deep learning
https://doi.org/10.1093/nargab/lqaa009
Apache License 2.0
81 stars 21 forks source link

Prediction using the seq2species model #3

Closed Bartvelp closed 4 years ago

Bartvelp commented 4 years ago

First off, thank you for your very nice paper, really interesting results! I am trying to reproduce your results, but I am getting stuck at the prediction using the Seq2Species model. I have installed DeepMicrobes and am trying to predict the species of 100bp fasta file (which contains 16s rRNA of E.coli).I do not have paired end reads.These are the commands I currently run: (DeepMicrobes) bart@Bart-HP-PAV14:~/DeepMicrobes/pipelines$ seq2tfrec_onehot.py --input_seq=../test_fasta_100bp.fa --output_tfrec=../temp.onehot.tfrec --is_train=False --seq_type=fasta Which seems to run fine and then: (DeepMicrobes) bart@Bart-HP-PAV14:~/DeepMicrobes/pipelines$ ./predict_seq2species.sh -i ../temp.onehot.tfrec -p 1 -m ../weights_seq2species/ -o test_output But this gives the following error:

(DeepMicrobes) bart@Bart-HP-PAV14:~/DeepMicrobes/pipelines$ ./predict_seq2species.sh -i ../temp.onehot.tfrec -p 1 -m ../weights_seq2species/ -o test_outputPrediction started ...
2020-05-28 12:32:42.030930: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
RUNNING MODE:  predict_paired_class
RUNNING MODE:  predict_paired_class
I0528 12:32:42.035207 139629123098432 tf_logging.py:115] Using default config.
I0528 12:32:42.035573 139629123098432 tf_logging.py:115] Using config: {'_model_dir': '../weights_seq2species/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7efdcec60438>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0528 12:32:42.096365 139629123098432 tf_logging.py:115] Calling model_fn.
I0528 12:32:42.222223 139629123098432 tf_logging.py:115] Done calling model_fn.
I0528 12:32:42.312916 139629123098432 tf_logging.py:115] Graph was finalized.
I0528 12:32:42.314475 139629123098432 tf_logging.py:115] Restoring parameters from ../weights_seq2species/model.ckpt-0
I0528 12:32:42.752866 139629123098432 tf_logging.py:115] Running local_init_op.
I0528 12:32:42.756458 139629123098432 tf_logging.py:115] Done running local_init_op.
Traceback (most recent call last):
  File "/home/bart/DeepMicrobes/DeepMicrobes.py", line 370, in <module>
    absl_app.run(main)
  File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/absl/app.py", line 278, in run
    _run_main(main, args)
  File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/absl/app.py", line 239, in _run_main
    sys.exit(main(argv))
  File "/home/bart/DeepMicrobes/DeepMicrobes.py", line 353, in main
    flags.FLAGS.translate)
  File "/home/bart/DeepMicrobes/models/format_prediction.py", line 93, in paired_report
    batch_prob = average_paired_end(batch_prob, num_classes)
  File "/home/bart/DeepMicrobes/models/format_prediction.py", line 18, in average_paired_end
    prob_matrix = np.mean(np.reshape(prob_matrix, (-1, 4, num_classes)), axis=1)
  File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 232, in reshape
    return _wrapfunc(a, 'reshape', newshape, order=order)
  File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
ValueError: cannot reshape array of size 2505 into shape (4,2505)
paste: test_output.category_paired.txt: No such file or directory
rm: cannot remove 'test_output.category_paired.txt': No such file or directory
rm: cannot remove 'test_output.prob_paired.txt': No such file or directory
Result: test_output.result.txt
(DeepMicrobes) bart@Bart-HP-PAV14:~/DeepMicrobes/pipelines$ 

This reshape also does not make sense to me. I would just like the probability for each class. It seems the probabilities are held in prob_matrix[0] but I don't know which index corresponds to which class (species). Any help would greatly be appreciated.

MicrobeLab commented 4 years ago

Hello Bart,

The mapping from index for the prob matrix to class was provided in a tab-file. Here is the mapping file for the pre-trained species model: (the 1st column holds the species names and the 2nd column holds the corresponding indexes).

https://github.com/MicrobeLab/DeepMicrobes/blob/master/data/name2label_species.txt

Note that you might have to train you own model according to your need. This pre-trained model is specific to the context of the paper.

For single-end reads, you could first convert them to interleaved reverse-complement form using seqtk. But this is optional. If you do this before TFRecord conversion, you could edit the line of (-1, 4, num_classes) of format_prediction.py by changing 4 to 2. And if not, you should change 4 to 1.

Bartvelp commented 4 years ago

Thanks a lot for your swift response! I understand now and it is working now. But as you suggested I should retrain the model. Which I again am experiencing some issues with, but I will open a new issue for it.