Open tkap243 opened 1 year ago
in [Load a Custom Dataset] section, it is mentioned that our data set should have a vocabulary file while my dataset is just a csv file I am wondering how can we generate this vocab file. does this pipeline generate it automatically?
Per the readme, the custom dataset is a tsv file, which is what our csv is. I'm uncertain what the vocab file should be.
Hi, the vocabulary file is just the list of words contained in the documents. You can see #92 on how to generate it from the tsv file.
Description
Hello,
I am having trouble loading my custom dataset. I followed the guide in the main README and am getting the below errors.
What I Did
from octis.dataset.dataset import Dataset import pandas as pd
df = pd.read_csv("/mnt/mydata/notebooks/data.csv")
df.to_csv('corpus.tsv', sep="\t", header= False, columns=['documents']) dataset.load_custom_dataset_from_folder("/mnt/mydata/notebooks")