-
Preserving the literary heritage is important to gain valuable insights into human history
and to gain knowledge about the different aspects of our ancestors’ lives. Documents,
whether written on le…
-
I don't know if this is already implemented or if there's a workaround for this. Sometimes, due to the large amount of data, what we have are training sentences that are already uniquely condensed (a…
-
Raised by @tuzhucheng as part of #311: How should we handle paragraph indexing in a more generic way? We **shouldn't** have separate Wikipedia and WikipediaParagraph collections. There should be a mor…
-
Display of statistical information of virtual corpora (number of documents, texts, tokens, sentences)
-
Hello,
Thank you for the wonderful repository.
I read that you're currently training on LJSpeech dataset for english TTS.
Do you have any updates on audio samples?
Also would you be able t…
-
It fails to recognize the following files as 6502 code:
-
- `osi_bas.bin` from
-
As it can be seen in the code sample below, we get different results if
* we pre-process a text with TextPreProcessor _**text_processor**_ and then create an Example with a torchtext.data.Field() wi…
-
Continuation of training is a bit shady right now. Adam optimizer statistics are not being saved and cannot be used for resumed training. They should be saved in a special *.npz file next to the model…
-
I ingested a corpus of about 2100 short documents (UTF-8, no XML markup) and the progress bar showed successful completion of all the processing steps. (I used a stop word list of my own; I chose 100 …
-
`link-parser` returns different parses when parsing the same corpus file multiple times. I carried out two tests with 29 and 30 runs at different points in time. The first test gave me two different v…