Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
computing statistics for file: text8.txt
100%|███████████████████████████████████████████████████████████████████| 1/1 [12:41<00:00, 761.71s/it]
Writing 1-grams...
entries:250,982 - tokens:16,996,178
writing stats to file /home/jj301440/ekphrasis-master/ekphrasis/tools/../stats/text8/counts_1grams.txt
Writing 2-grams...
entries:4,136,483 - tokens:16,996,178
writing stats to file /home/jj301440/ekphrasis-master/ekphrasis/tools/../stats/text8/counts_2grams.txt
Writing 3-grams...
entries:10,327,876 - tokens:16,996,178
writing stats to file /home/jj301440/ekphrasis-master/ekphrasis/tools/../stats/text8/counts_3grams.txt
Traceback (most recent call last):
File "generate_stats.py", line 191, in
write_stats(stats)
File "generate_stats.py", line 147, in write_stats
write_stats_to_file(filename, counter, args.mincount[int(k) - 1])
IndexError: list index out of range
computing statistics for file: text8.txt 100%|███████████████████████████████████████████████████████████████████| 1/1 [12:41<00:00, 761.71s/it]
Writing 1-grams... entries:250,982 - tokens:16,996,178 writing stats to file /home/jj301440/ekphrasis-master/ekphrasis/tools/../stats/text8/counts_1grams.txt Writing 2-grams... entries:4,136,483 - tokens:16,996,178 writing stats to file /home/jj301440/ekphrasis-master/ekphrasis/tools/../stats/text8/counts_2grams.txt Writing 3-grams... entries:10,327,876 - tokens:16,996,178 writing stats to file /home/jj301440/ekphrasis-master/ekphrasis/tools/../stats/text8/counts_3grams.txt Traceback (most recent call last): File "generate_stats.py", line 191, in
write_stats(stats)
File "generate_stats.py", line 147, in write_stats
write_stats_to_file(filename, counter, args.mincount[int(k) - 1])
IndexError: list index out of range
Can you please tell me where the problem is?