Warning: tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version. Instructions for updating: Please use tensorflow/transform or tf.data.

dennybritz / cnn-text-classification-tf

Convolutional Neural Network for Text Classification in Tensorflow

Apache License 2.0

5.65k stars 2.77k forks source link

Warning: tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version. Instructions for updating: Please use tensorflow/transform or tf.data. #147

Open nidhikamath91 opened 6 years ago

nidhikamath91 commented 6 years ago

Hello,

I am using tensorflow on linux and while using tensorflow.contrib.learn.python.learn.preprocessing, I get the below error

WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version. Instructions for updating: Use the retry module or similar alternatives. WARNING:tensorflow:From /tmp/anyReader-376H566fJpAUSEt/anyReader-376qtSRQxT2gOiq.tmp:67: VocabularyProcessor.init (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version. Instructions for updating: Please use tensorflow/transform or tf.data. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:154: CategoricalVocabulary.init (from tensorflow.contrib.learn.python.learn.preprocessing.categorical_vocabulary) is deprecated and will be removed in a future version. Instructions for updating: Please use tensorflow/transform or tf.data. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:170: tokenizer (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version. Instructions for updating: Please use tensorflow/transform or tf.data.`

How do I eliminate them

vsocrates commented 6 years ago

The VocabularyProcessor class is deprecated in (I believe) Tensorflow v1.8. The reasons is because they want to encourage you to use the Datasets API. I used this code as a starting point https://github.com/LightTag/BibSample/blob/master/preppy.py

Hope this helps!

LilyDreamZhao commented 6 years ago

Are you done? I am also the problem. The explanation upstairs still feels confused

nidhikamath91 commented 6 years ago

I used the exact code in preprocessing. Txt and it worked. I did not understand the explanation in the new workaround.

On Wed, Sep 12, 2018, 10:31 AM LilyDreamZhao notifications@github.com wrote:

Are you done? I am also the problem. The explanation upstairs still feels confused

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dennybritz/cnn-text-classification-tf/issues/147#issuecomment-420560217, or mute the thread https://github.com/notifications/unsubscribe-auth/Ac4isGPlU37Hqz-OMb-X3WBy-L2huCyOks5uaMZugaJpZM4US8zt .

ShaneTian commented 5 years ago

You can use this:

tokenizer = tf.keras.preprocessing.text.Tokenizer(oov_token="<UNK>")
tokenizer.fit_on_texts(x_text)
x = tokenizer.texts_to_sequences(x_text)

x = tf.keraspreprocessing.sequence.pad_sequences(x, maxlen=max_document_length, padding='post', truncating='post')

bhuvanshukla commented 5 years ago

You can use this:

tokenizer = tf.keras.preprocessing.text.Tokenizer(oov_token="<UNK>")
tokenizer.fit_on_texts(x_text)
x = tokenizer.texts_to_sequences(x_text)

x = tf.keraspreprocessing.sequence.pad_sequences(x, maxlen=max_document_length, padding='post', truncating='post')

Hi please correct that to tf.keras.preprocessing Also please can you help how to transform my below code with tf.keras please:

vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(max_sequence_length)
x_data = np.array(list(vocab_processor.fit_transform(data)))
vocab_size=len(vocab_processor.vocabulary_)
print(vocab_size)

KoustubhPhalak commented 4 years ago

@bhuvanshukla were you able to resolve your issue? I am also facing the same issue.

anantvir commented 4 years ago

@bhuvanshukla were you able to resolve this issue ? I am also facing the same issue

franklinqin0 commented 4 years ago

You can use this:

tokenizer = tf.keras.preprocessing.text.Tokenizer(oov_token="<UNK>")
tokenizer.fit_on_texts(x_text)
x = tokenizer.texts_to_sequences(x_text)

x = tf.keraspreprocessing.sequence.pad_sequences(x, maxlen=max_document_length, padding='post', truncating='post')

Hi please correct that to tf.keras.preprocessing Also please can you help how to transform my below code with tf.keras please:

vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(max_sequence_length)
x_data = np.array(list(vocab_processor.fit_transform(data)))
vocab_size=len(vocab_processor.vocabulary_)
print(vocab_size)

Please correct me if I am wrong. Here is what I learned from lukas' example:

from tensorflow.keras.preprocessing import text, sequence

tokenizer = text.Tokenizer(num_words=VOCAB_SIZE)
tokenizer.fit_on_texts(x_train)
x_train = tokenizer.texts_to_sequences(x_train)
x_train = sequence.pad_sequences(x_train, maxlen=MAX_SEQUENCE_LENGTH)
x_test = tokenizer.texts_to_sequences(x_test)
x_test = sequence.pad_sequences(x_test, maxlen=MAX_SEQUENCE_LENGTH)

In this way, you only get top MAX_SEQUENCE_LENGTH words, so don't need vocab_size.

KoustubhPhalak commented 4 years ago

Thanks Franklin Qin for the suggestion, will try this one. Thanks & Regards,Koustubh Phalak.+91-98677 63453 From: Franklin QinSent: 22 April 2020 15:17To: dennybritz/cnn-text-classification-tfCc: KoustubhPhalak; CommentSubject: Re: [dennybritz/cnn-text-classification-tf] Warning: tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version. Instructions for updating: Please use tensorflow/transform or tf.data. (#147) You can use this:tokenizer = tf.keras.preprocessing.text.Tokenizer(oov_token="")tokenizer.fit_on_texts(x_text)x = tokenizer.texts_to_sequences(x_text) x = tf.keraspreprocessing.sequence.pad_sequences(x, maxlen=max_document_length, padding='post', truncating='post')Hi please correct that to tf.keras.preprocessingAlso please can you help how to transform my below code with tf.keras please:vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(max_sequence_length)x_data = np.array(list(vocab_processor.fit_transform(data)))vocab_size=len(vocab_processor.vocabulary_)print(vocab_size)Please correct me if I am wrong. Here is what I came up with:tokenizer = text.Tokenizer(num_words=VOCAB_SIZE)tokenizer.fit_on_texts(x_train)x_train = tokenizer.texts_to_sequences(x_train)x_train = sequence.pad_sequences(x_train, maxlen=MAX_SEQUENCE_LENGTH)x_test = tokenizer.texts_to_sequences(x_test)x_test = sequence.pad_sequences(x_test, maxlen=MAX_SEQUENCE_LENGTH)In this way, you only get top MAX_SEQUENCE_LENGTH words, so don't need vocab_size.—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

mikechen66 commented 4 years ago

The VocabularyProcessor class is deprecated in (I believe) Tensorflow v1.8. The reasons is because they want to encourage you to use the Datasets API. I used this code as a starting point https://github.com/LightTag/BibSample/blob/master/preppy.py

Hope this helps!

Your script is quite explicit. If the following function were able to convert to a specific function in TensorFlow 2.x, that would be good. VocabularyProcessor(max_sequence_length, min_frequency=min_word_frequency)