Closed batmanscode closed 1 year ago
Hi @batmanscode ,
It seems that there is an empty document in your df['clean_text']
. Could you check the value of df['clean_text']
to make sure there are no blank documents?
@bab2min df['clean_text'].isnull().value_counts()
showed no empty values
@batmanscode
df.isnull()
tests only if the value is NA
or not. Because an empty str ''
is not NA
, it doesn't show any empty strings. Try following:
df['clean_text'].apply(lambda x:bool(x.strip())).value_counts()
@batmanscode
df.isnull()
tests only if the value isNA
or not. Because an empty str''
is notNA
, it doesn't show any empty strings. Try following:df['clean_text'].apply(lambda x:bool(x.strip())).value_counts()
Ah this makes sense, thanks you. There are indeed empty values here. Are there some ways to get tomotopy to skip these? It's not really a problem to remove, but just curious
@batmanscode Currently, add_doc
has no such feature. But I think it's a good idea to add the option to ignore empty docs.
@bab2min Agreed. Would be a nice quality of life feature to have
I have text in a dataframe and was adding it in like this:
This works fine
However, when I tried to remove stopwords before using
add_doc
I get the error in the titleI'm doing the preprocessing using texthero like this:
Side note: maybe this could be built into tomotopy using texthero