Missing datasets directory

5agado / recurrent-neural-networks-intro

Implementation of RNN in Python

Apache License 2.0

52 stars 29 forks source link

Missing datasets directory #2

Closed radiantone closed 6 years ago

radiantone commented 6 years ago

Was trying to run the notebook, but looks like datasets is not in the repo.

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/songdata.txt'

5agado commented 6 years ago

I didn't include external dataset in the repo, but added some links in the README.

The notebooks will need some updates based on your own setup and target resources.

radiantone commented 6 years ago

I saw in the readme, but the songdata did not appear to have a link. It's not clear what the format of the songdata.txt file is, so adapting the notebook to my data is going to be difficult. Maybe you can provide a snippet of the format here.

I got the songdata.csv but it appears to be csv, not .txt

5agado commented 6 years ago

I used .txt because in the notebook I exactly process text as a continuous blob as you can see from the corpus_text = f.read() line.

There are many variations around this, especially for the many recent huge improvements on the task. The notebook is just meant to give rough guidelines for the process.

radiantone commented 6 years ago

Sure. It's very appreciated. But without being able to run the notebook things are much more difficult. Can you paste a few lines from songdata.txt so we can see what the data format is? That will allow the notebook to run as is.

5agado commented 6 years ago

As said, there is not really a format, as it is a txt file, and read as a single blob. I would end up just pasting here a list of words separated by spaces and occasional new-line chars, which is irrelevant as I later do tokenization of the whole. That's why that section is called "Continuous Text".