-
Text preprocessing
-
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.text import text_to_word_sequence
作者好,导包的这两个语句总报错,查了一圈没有解决方案,想问下您
-
It seems that these Spanish « open and closing » double quote marks are removed from the training data.
A lot of pre and post processing is required to support them:
I did a find and replace to ch…
-
Is there any best practice for using litdata to load custom data for pretraining? I found that TextFiles.py and prepare_slimpajama.py have similar data preprocessing methods. The difference between th…
-
Hi, thanks for sharing this great work.
I noticed that on ICDAR2019 ArT results ranking table, there is a saying "Before text recognition, we used the text detector called CRAFT as a preprocessing s…
-
orange 3.3
orange-text 1.15.0
OS linux mint victoria
run it with the "python3 -m Orange.canvas" command
Expected behavior
It's suposed to create a worcloud from a corpus file.
Actual behavio…
-
Vision LLMs like [Llava](https://huggingface.co/docs/transformers/en/model_doc/llava) or [Idefics](https://huggingface.co/docs/transformers/v4.39.3/en/model_doc/idefics#transformers.IdeficsImageProces…
-
Hi,
I have seen that dali already have prebuilt functions for image preprocessing like dali.fn.resize(images, resize_x=299, resize_y=299).
But does it provide any preprocessing functions for text…
-
I am trying to train a new language model (Portuguese) but I am encountering the error "The expanded size of the tensor (768) must match the existing size (1024) at non-singleton dimension 0. Target s…
-
OCTIS version: 1.11.0
Python version: 3.8.15
Operating System: 'posix'
### Description - What I Did
I read in my own data and save it as .txt with one document per line. Then I define the prep…