EngSalem / TextClassification_Off_the_shelf

8 stars 7 forks source link

InvalidArgumentError: indices[44,1] = 6406 is not in [0, 91) #1

Closed armmmm closed 5 years ago

armmmm commented 6 years ago

Hello,

I got the following error when I tried to run the code. I did not change anything just ran the test_baselines.py with the default parameters using the same dataset in the project.

Any advice to solve the problem.

runfile('C:/D PhD Computing School/CollectData/data/Corpus/Clients Annotated/ML Data/TextClassification_Off_the_shelf-master/test_baselines.py', wdir='C:/D PhD Computing School/CollectData/data/Corpus/Clients Annotated/ML Data/TextClassification_Off_the_shelf-master') C:\Program Files\Anaconda\envs\python35\lib\site-packages\h5py__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Using TensorFlow backend. C:\Program Files\Anaconda\envs\python35\lib\site-packages\gensim\utils.py:1212: UserWarning: detected Windows; aliasing chunkize to chunkize_serial warnings.warn("detected Windows; aliasing chunkize to chunkize_serial") ----- Load Train and Test Data -------- ---- Tokenizing Training and Testing Data ------ C:\Program Files\Anaconda\envs\python35\lib\site-packages\keras\preprocessing\text.py:172: UserWarning: The nb_words argument in Tokenizer has been renamed num_words. warnings.warn('The nb_words argument in Tokenizer ' ---- Tokenizing Training and Testing Data ------ Train on 86540 samples, validate on 10818 samples Epoch 1/10 Traceback (most recent call last):

File "", line 1, in runfile('C:/D PhD Computing School/CollectData/data/Corpus/Clients Annotated/ML Data/TextClassification_Off_the_shelf-master/test_baselines.py', wdir='C:/D PhD Computing School/CollectData/data/Corpus/Clients Annotated/ML Data/TextClassification_Off_the_shelf-master')

File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile execfile(filename, namespace)

File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/D PhD Computing School/CollectData/data/Corpus/Clients Annotated/ML Data/TextClassification_Off_the_shelf-master/test_baselines.py", line 70, in CNNBaseline.train_model(CNNBaseline.model,X_train,Y_train=Y_train,X_valid=X_valid,Y_valid=Y_valid)

File "C:\D PhD Computing School\CollectData\data\Corpus\Clients Annotated\ML Data\TextClassification_Off_the_shelf-master\BaselineModels.py", line 52, in train_model Model.fit(X_train, Y_train, validation_data=(X_valid, Y_valid), epochs=self.num_epochs, batch_size=self.batch_size)

File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\keras\engine\training.py", line 1705, in fit validation_steps=validation_steps)

File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\keras\engine\training.py", line 1236, in _fit_loop outs = f(ins_batch)

File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\keras\backend\tensorflow_backend.py", line 2482, in call **self.session_kwargs)

File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tensorflow\python\client\session.py", line 900, in run run_metadata_ptr)

File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run feed_dict_tensor, options, run_metadata)

File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run run_metadata)

File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call raise type(e)(node_def, op, message)

InvalidArgumentError: indices[44,1] = 6406 is not in [0, 91) [[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@training/RMSprop/gradients/embedding_1/GatherV2_grad/Reshape"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, _arg_input_1_0_3, conv1d_1/convolution/ExpandDims_1/dim)]]

Caused by op 'embedding_1/GatherV2', defined at: File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 231, in main() File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 227, in main kernel.start() File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\ipykernel\kernelapp.py", line 477, in start ioloop.IOLoop.instance().start() File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start super(ZMQIOLoop, self).start() File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tornado\ioloop.py", line 888, in start handler_func(fd_obj, events) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper return fn(*args, kwargs) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events self._handle_recv() File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv self._run_callback(callback, msg) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback callback(*args, *kwargs) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper return fn(args, kwargs) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher return self.dispatch_shell(stream, msg) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\ipykernel\kernelbase.py", line 235, in dispatch_shell handler(stream, idents, msg) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request user_expressions, allow_stdin) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, kwargs) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\IPython\core\interactiveshell.py", line 2717, in run_cell interactivity=interactivity, compiler=compiler, result=result) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\IPython\core\interactiveshell.py", line 2827, in run_ast_nodes if self.run_code(code, result): File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in runfile('C:/D PhD Computing School/CollectData/data/Corpus/Clients Annotated/ML Data/TextClassification_Off_the_shelf-master/test_baselines.py', wdir='C:/D PhD Computing School/CollectData/data/Corpus/Clients Annotated/ML Data/TextClassification_Off_the_shelf-master') File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile execfile(filename, namespace) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/D PhD Computing School/CollectData/data/Corpus/Clients Annotated/ML Data/TextClassification_Off_the_shelf-master/test_baselines.py", line 69, in CNNBaseline=cnn_kim(cnn_rand=RAND,STATIC=Trainable,ExternalEmbeddingModel=args.embedd_file,EmbeddingType=args.EMB_type,n_symbols=n_symbols,wordmap=word_map) File "C:\D PhD Computing School\CollectData\data\Corpus\Clients Annotated\ML Data\TextClassification_Off_the_shelf-master\BaselineModels.py", line 131, in init embedding_seq = embedding_layer(Sequence_in) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\keras\engine\topology.py", line 619, in call output = self.call(inputs, kwargs) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\keras\layers\embeddings.py", line 138, in call out = K.gather(self.embeddings, inputs) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\keras\backend\tensorflow_backend.py", line 1215, in gather return tf.gather(reference, indices) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tensorflow\python\ops\array_ops.py", line 2736, in gather return gen_array_ops.gather_v2(params, indices, axis, name=name) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 3668, in gather_v2 "GatherV2", params=params, indices=indices, axis=axis, name=name) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op op_def=op_def) File "C:\Program Files\Anaconda\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): indices[44,1] = 6406 is not in [0, 91) [[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@training/RMSprop/gradients/embedding_1/GatherV2_grad/Reshape"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, _arg_input_1_0_3, conv1d_1/convolution/ExpandDims_1/dim)]]

EngSalem commented 6 years ago

Hi it seems that you are using windows, I have tested my code on linux distributions, but lets debug this together, I believe the main issue might be the embedding model, did you download the embedding model I am attaching to this repository, you need to check that both word indeces coming out of gensim matches the the same indeces that keras assigned to each token

armmmm commented 6 years ago

I downloaded the embedding model and got the same error, I don't know how to check the word indices, but I will try and read.

armmmm commented 6 years ago

Hello,

It worked for all models but for CNN I need to change RAND to False to let this model work.

EngSalem commented 6 years ago

good, I will check why CNN isn't working