Open p-dre opened 1 year ago
I have the same problem. Load a custom dataset.
Python 3.10.11 OCTIS 1.12.1 System: Windows 10
Code: import os import string import spacy from octis.preprocessing.preprocessing import Preprocessing
preprocessor = Preprocessing(lowercase = True, vocabulary = None, max_features = None, remove_punctuation = True, punctuation = string.punctuation, lemmatize = True, language = 'portuguese', remove_numbers = True, min_chars = 4, remove_stopwords_spacy = True, min_df = 0.1, max_df = 0.8, num_processes = 7)
AttributeError: 'list' object has no attribute 'lower'
I'm getting the same issue. The issue only seems to persist if, when using Preprocessing
, num_processes
is not None
or if split=True
. Seems like these functions transform a list of strings (e.g., ['dog', 'cat']
) to a list of a list of strings (e.g., [['d', 'o', g'], ['c', 'a', 't']]
)
OCTIS version: 1.11.0 Python version: 3.8.15 Operating System: 'posix'
Description - What I Did
I read in my own data and save it as .txt with one document per line. Then I define the preprocessing and execute it via preprocessor.preprocess_dataset. The error message is AttributeError: 'list' object has no attribute 'lower'. If I set no num_processes all is working.
The loop in simple_preprocessing_steps in combination with process_map breaks the documents into letters. See below