Retraining seq2species model gives error

Bartvelp commented 4 years ago

I have created a labelled fasta file based on the refseq (full-length) 16s rrna database like so:

>label|0|Abiotrophia defectiva
AGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAACGAACCGCGACTAGGTGCTTGCACTTGGTCAAGGTGAGTGGCGAACGGGTGAGTAACACGTGGGTAACCTACCTCATAGTGGGGGATAACAGTCGGAAACGACTGCTAATACCGTTAGCTAGTTGGTAGGGTAAGGNCCTACCAAGGCGATGATGCATAGCCGACCTGAGAGGGTGATCGGCCACATTGGGACTGAGACACGGCCCAAACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGACGCAAGTCTGACGGAGCAACGCCGCGTGAGTGAAGAAGGTCTTCGGA....
>label|1|Absiella dolichum
CTGGCTCAGGATGAACGCTGGCGGCATGCCTAATACATGCAAGTCGAACGAAGTTTTTAGGAAAGCTTGCTTTCCAAAAAGACTTAGTGGCGAACGGGTGAGTAACACGTAGATAACCTGCCCATGTGCCCGGGATAACTGCTGGAAACGGTAGCTAAAACCGGATAGGTGGCTTCGAGGCATCTCGGAGACATTAAAATGGCTAAGGCCATGAACA...
>label|2|Absiella tortuosum
CAAATGGAGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCATGCCTAATACATGCAAGTCGAACGAAGTCAATTGAAAGCTTGCTTTTAAAAGACTTAGTGGCGAACGGGTGAGTAACNCGTAGGTAACCTACCCATGTAACTGGGATAACTGCTGGAAACGGTAGCTAAAACCGGATAGGTAAGATTGAGGCATCTTAATCTTATGAAAAAAGC...
>etc.

I then converted this file to a TFrecord using this command seq2tfrec_onehot.py --input_seq=../combined_train_labelled.fa --output_tfrec=../combined_train.tfrec --is_train=True

Then why I try to train the seq2species model I get the following error:

Click to expand

``` (DeepMicrobes) bart@Bart-HP-PAV14:~/DeepMicrobes$ DeepMicrobes.py --input_tfrec=combined_train.tfrec --model_name=seq2species --model_dir=seq2species_new_weights --max_len=100 2020-05-30 17:36:23.820149: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA RUNNING MODE: train I0530 17:36:23.822940 140213195958080 tf_logging.py:115] Using config: {'_model_dir': 'seq2species_new_weights', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 100000, '_save_checkpoints_secs': None, '_session_config': None, '_keep_checkpoint_max': 1000, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} W0530 17:36:23.823674 140213195958080 tf_logging.py:120] 'cpuinfo' not imported. CPU info will not be logged. W0530 17:36:23.823904 140213195958080 tf_logging.py:120] 'psutil' not imported. Memory info will not be logged. I0530 17:36:23.823984 140213195958080 tf_logging.py:115] Benchmark run: {'model_name': 'model', 'dataset': {'name': 'dataset_name'}, 'machine_config': {'gpu_info': {'count': 0}}, 'run_date': '2020-05-30T15:36:23.823391Z', 'tensorflow_version': {'version': '1.9.0', 'git_hash': 'v1.9.0-0-g25c197e023'}, 'tensorflow_environment_variables': [], 'run_parameters': [{'name': 'batch_size', 'long_value': 32}, {'name': 'train_epochs', 'long_value': 1}]} I0530 17:36:23.885492 140213195958080 tf_logging.py:115] Calling model_fn. INPUTS BEFORE RESHAPE Tensor("IteratorGetNext:0", shape=(?, ?), dtype=int64, device=/device:CPU:0) INPUTS Tensor("reshape_input:0", shape=(?, 1, 100, 4), dtype=int64) filter_dim (5, 1) SHAPE (?, 1, 100, 4) FILTERS Traceback (most recent call last): File "/home/bart/DeepMicrobes/DeepMicrobes.py", line 368, in absl_app.run(main) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/absl/app.py", line 278, in run _run_main(main, args) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/absl/app.py", line 239, in _run_main sys.exit(main(argv)) File "/home/bart/DeepMicrobes/DeepMicrobes.py", line 360, in main train(flags.FLAGS, model_fn, 'dataset_name') File "/home/bart/DeepMicrobes/DeepMicrobes.py", line 214, in train classifier.train(input_fn=input_fn_train, hooks=train_hooks) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 366, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1119, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1132, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1107, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/home/bart/DeepMicrobes/DeepMicrobes.py", line 101, in model_fn logits = model(features) File "/home/bart/DeepMicrobes/models/seq2species.py", line 163, in __call__ x = convolution(x, (spatial_conv_width[0], 1), pointwise_conv_depth[0], weight_init_scale) File "/home/bart/DeepMicrobes/models/seq2species.py", line 102, in convolution padding=padding) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py", line 556, in separable_conv2d op=op) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 364, in with_space_to_batch return new_op(input, None) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 520, in __call__ return self.call(inp, filter) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 354, in return lambda inp, _: op(inp, num_spatial_dims, padding) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py", line 548, in op name="depthwise") File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 2111, in depthwise_conv2d_native name=name) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 609, in _apply_op_helper param_name=input_name) File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 60, in _SatisfiesTypeConstraint ", ".join(dtypes.as_dtype(x).name for x in allowed_list))) TypeError: Value passed to parameter 'input' has DataType int64 not in list of allowed values: float16, bfloat16, float32, float64 ```

If I add the following to remedy this error:

        x = tf.reshape(inputs, [-1, 1, self.max_len, 4], name='reshape_input')
        x = tf.cast(x, tf.float32) #added

in seq2species.py in the __call__function the model seems to compile but crashes with the following error eventually:

2020-05-30 17:33:38.053266: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: read.  Data types don't match. Expected type: int64, Actual type: float
Traceback (most recent call last):
  File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/bart/miniconda3/envs/DeepMicrobes/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Key: read.  Data types don't match. Expected type: int64, Actual type: float
         [[Node: ParseSingleExample/ParseSingleExample = ParseSingleExample[Tdense=[DT_INT64], dense_keys=["label"], dense_shapes=[[?]], num_sparse=1, sparse_keys=["read"], sparse_types=[DT_INT64], _device="/device:CPU:0"](arg0, ParseSingleExample/Const)]]
         [[Node: IteratorGetNext = IteratorGetNext[output_shapes=[[?,?], [?,?]], output_types=[DT_INT64, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]

Any help would be greatly appreciated!

MicrobeLab commented 4 years ago

You could try adding a print message around line 94-97 of seq2tfrec_onehot.py to make sure that a training set (rather than a test set) is being converted.

By the way, if you only want to reproduce 16S prediction using the seq2species model. The original implementation by google might be helpful:

https://github.com/tensorflow/models/tree/master/research/seq2species

Bartvelp commented 4 years ago

Yes I already made sure it is converted to a training set with the convert_advance_file function and that function correctly extracts the information.

Turns out the input_fn_train is set depending on the --encode_method flag which I failed to set, it default to kmer which is of course wrong. Setting --encode_method to one_hot fixes the TFrecord parsing, and the training starts succesfully.

Calculation the loss seems to fail however and I am not sure what is causing it. I am getting this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to explicitly squeeze dimension 1 but dimension was not 1: 0
         [[Node: sparse_softmax_cross_entropy_loss/remove_squeezable_dimensions/Squeeze = Squeeze[T=DT_INT64, squeeze_dims=[-1], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorGetNext:1)]]

Full log here

Any idea on how to fix this/what causes this?

P.S. I found the original paper and code to be very convoluted and difficult to work with, and I am interest in also trying the other models in this repo.

MicrobeLab commented 4 years ago

I'm not really sure about the solution. But I think the problem lies in the training data (e.g., the length of DNA sequences) rather than the model. The model needs a flag of --max_len whose default value is 150 bp. Try setting it to the max length of your full-length 16S data.

Bartvelp commented 4 years ago

Oops I see I forgot to add the command I ran DeepMicrobes.py --input_tfrec=combined_train_small.tfrec --model_name=seq2species --model_dir=seq2species_new_weights_small --max_len=400 --encode_method=one_hot (I trimmed the sequences to 400bp). So that should not be the problem. When I did forget to set the --max_len I get an error about padding to a lower size than the original.

MicrobeLab commented 4 years ago

Try deleting the model_dir (rm -rf seq2species_new_weights_small) and running again.

Bartvelp commented 4 years ago

Still no luck unfortunatly Log This is my repo if you are puzzeled by the print statements github.com/Bartvelp/DeepMicrobes_clone

MicrobeLab commented 4 years ago

You should set the --num_classes flag to your actual number of categories. The default value is --num_classes=2505 (I had 2505 species for the pre-trained model).

Bartvelp commented 4 years ago

yes thank you I forgot that.

I figured it out, due to a weird bug or something my tfrecord file did not contain the classes/labels. When I recreated them it all worked out-of-the-box. Thanks alot for your help! Closing

MicrobeLab / DeepMicrobes

Retraining seq2species model gives error #4