-
As a good practice, we should convert any encoded string (UTF-8, ISO-8859-1 etc.) to Unicode in software input, work on data using Unicode and then, just before outputting data, convert it to some enc…
-
@ewan-klein is working on a [TwitterCorpusReader](https://github.com/nltk/nltk/blob/twitter/nltk/corpus/reader/twitter.py) that includes:
- `tweets()` – returns a list of strings, one per tweet
- `ful…
-
Hi Dan,
I have a few questions on the scope of this project. I understand this is merely an LM creation tools with Bells and whistles to optimize the perplexity and such.
I have 2 major points that …
-
In current release, the asr_egs has the support for hkust. However, acquiring the hkust is not very easy, while getting the thchs30 corpus is more convenient.
Is the support for thchs30 in the pl…
-
Let me know if this is what you expected:
https://github.com/vince62s/pocolm/blob/perplex/egs/tedlium/perplex.sh
-
```
DKPro has yet no reader that can read the tagged plain-text corpora that comes
along with the PTB.
Points for discussion:
- corpora contain noun phrase annotations (in addition to the tags), is …
-
```
DKPro has yet no reader that can read the tagged plain-text corpora that comes along
with the PTB.
Points for discussion:
- corpora contain noun phrase annotations (in addition to the tags), is t…
-
```
DKPro has yet no reader that can read the tagged plain-text corpora that comes
along with the PTB.
Points for discussion:
- corpora contain noun phrase annotations (in addition to the tags), is …
-
```
DKPro has yet no reader that can read the tagged plain-text corpora that comes
along with the PTB.
Points for discussion:
- corpora contain noun phrase annotations (in addition to the tags), is …
-
```
DKPro has yet no reader that can read the tagged plain-text corpora that comes
along with the PTB.
Points for discussion:
- corpora contain noun phrase annotations (in addition to the tags), is …