Closed radiantone closed 6 years ago
I didn't include external dataset in the repo, but added some links in the README.
The notebooks will need some updates based on your own setup and target resources.
I saw in the readme, but the songdata did not appear to have a link. It's not clear what the format of the songdata.txt file is, so adapting the notebook to my data is going to be difficult. Maybe you can provide a snippet of the format here.
I got the songdata.csv but it appears to be csv, not .txt
I used .txt because in the notebook I exactly process text as a continuous blob as you can see from the corpus_text = f.read()
line.
There are many variations around this, especially for the many recent huge improvements on the task. The notebook is just meant to give rough guidelines for the process.
Sure. It's very appreciated. But without being able to run the notebook things are much more difficult. Can you paste a few lines from songdata.txt so we can see what the data format is? That will allow the notebook to run as is.
As said, there is not really a format, as it is a txt file, and read as a single blob. I would end up just pasting here a list of words separated by spaces and occasional new-line chars, which is irrelevant as I later do tokenization of the whole. That's why that section is called "Continuous Text".
Was trying to run the notebook, but looks like datasets is not in the repo.
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/songdata.txt'