Open nidhikamath91 opened 6 years ago
The VocabularyProcessor class is deprecated in (I believe) Tensorflow v1.8. The reasons is because they want to encourage you to use the Datasets API. I used this code as a starting point https://github.com/LightTag/BibSample/blob/master/preppy.py
Hope this helps!
Are you done? I am also the problem. The explanation upstairs still feels confused
I used the exact code in preprocessing. Txt and it worked. I did not understand the explanation in the new workaround.
On Wed, Sep 12, 2018, 10:31 AM LilyDreamZhao notifications@github.com wrote:
Are you done? I am also the problem. The explanation upstairs still feels confused
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dennybritz/cnn-text-classification-tf/issues/147#issuecomment-420560217, or mute the thread https://github.com/notifications/unsubscribe-auth/Ac4isGPlU37Hqz-OMb-X3WBy-L2huCyOks5uaMZugaJpZM4US8zt .
You can use this:
tokenizer = tf.keras.preprocessing.text.Tokenizer(oov_token="<UNK>")
tokenizer.fit_on_texts(x_text)
x = tokenizer.texts_to_sequences(x_text)
x = tf.keraspreprocessing.sequence.pad_sequences(x, maxlen=max_document_length, padding='post', truncating='post')
You can use this:
tokenizer = tf.keras.preprocessing.text.Tokenizer(oov_token="<UNK>") tokenizer.fit_on_texts(x_text) x = tokenizer.texts_to_sequences(x_text) x = tf.keraspreprocessing.sequence.pad_sequences(x, maxlen=max_document_length, padding='post', truncating='post')
Hi please correct that to tf.keras.preprocessing
Also please can you help how to transform my below code with tf.keras please:
vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(max_sequence_length)
x_data = np.array(list(vocab_processor.fit_transform(data)))
vocab_size=len(vocab_processor.vocabulary_)
print(vocab_size)
@bhuvanshukla were you able to resolve your issue? I am also facing the same issue.
@bhuvanshukla were you able to resolve this issue ? I am also facing the same issue
You can use this:
tokenizer = tf.keras.preprocessing.text.Tokenizer(oov_token="<UNK>") tokenizer.fit_on_texts(x_text) x = tokenizer.texts_to_sequences(x_text) x = tf.keraspreprocessing.sequence.pad_sequences(x, maxlen=max_document_length, padding='post', truncating='post')
Hi please correct that to
tf.keras.preprocessing
Also please can you help how to transform my below code with tf.keras please:vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(max_sequence_length) x_data = np.array(list(vocab_processor.fit_transform(data))) vocab_size=len(vocab_processor.vocabulary_) print(vocab_size)
Please correct me if I am wrong. Here is what I learned from lukas' example:
from tensorflow.keras.preprocessing import text, sequence
tokenizer = text.Tokenizer(num_words=VOCAB_SIZE)
tokenizer.fit_on_texts(x_train)
x_train = tokenizer.texts_to_sequences(x_train)
x_train = sequence.pad_sequences(x_train, maxlen=MAX_SEQUENCE_LENGTH)
x_test = tokenizer.texts_to_sequences(x_test)
x_test = sequence.pad_sequences(x_test, maxlen=MAX_SEQUENCE_LENGTH)
In this way, you only get top MAX_SEQUENCE_LENGTH
words, so don't need vocab_size
.
The VocabularyProcessor class is deprecated in (I believe) Tensorflow v1.8. The reasons is because they want to encourage you to use the Datasets API. I used this code as a starting point https://github.com/LightTag/BibSample/blob/master/preppy.py
Hope this helps!
Your script is quite explicit. If the following function were able to convert to a specific function in TensorFlow 2.x, that would be good.
VocabularyProcessor(max_sequence_length, min_frequency=min_word_frequency)
Hello,
I am using tensorflow on linux and while using tensorflow.contrib.learn.python.learn.preprocessing, I get the below error
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version. Instructions for updating: Use the retry module or similar alternatives. WARNING:tensorflow:From /tmp/anyReader-376H566fJpAUSEt/anyReader-376qtSRQxT2gOiq.tmp:67: VocabularyProcessor.init (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version. Instructions for updating: Please use tensorflow/transform or tf.data. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:154: CategoricalVocabulary.init (from tensorflow.contrib.learn.python.learn.preprocessing.categorical_vocabulary) is deprecated and will be removed in a future version. Instructions for updating: Please use tensorflow/transform or tf.data. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:170: tokenizer (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version. Instructions for updating: Please use tensorflow/transform or tf.data.`
How do I eliminate them