Closed rhagarty closed 5 years ago
I modified the file name to corpus-1.txt
and tried to upload via the UI I see the following error in the logs:
PayloadTooLargeError: request entity too large
[0] at readStream (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/raw-body/index.js:155:17)
[0] at getRawBody (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/raw-body/index.js:108:12)
[0] at read (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/body-parser/lib/read.js:77:3)
[0] at jsonParser (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/body-parser/lib/types/json.js:135:5)
[0] at Layer.handle [as handle_request] (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express/lib/router/layer.js:95:5)
[0] at trim_prefix (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express/lib/router/index.js:317:13)
[0] at /Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express/lib/router/index.js:284:7
[0] at Function.process_params (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express/lib/router/index.js:335:12)
[0] at next (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express/lib/router/index.js:275:10)
[0] at Immediate.<anonymous> (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express-session/index.js:489:7)```
Hi Rich, good that you caught the inconsistency in the file naming between the GUI and the command line. I did not use the .txt extension because the source files already use this extension and I was using the "*.txt" expression. Renaming should make it consistent with what the GUI wants.
The error is apparently because there is some limit in the GUI code.
@yhwang @pvaneck Is this correct? Can we bump up or remove the limit?
the default limitation at bodyparser is 100kb. It's small. I checked the Watson Speech to Text api, I can't find the file size limitation of addCorpus api. We definitely need to increase the file size limitation at our end. The question is what the proper size is.
@rhagarty can you share the file size of your corpus-1.txt
?
@yhwang The text file is 669Kb. Note that the audio files are 60-85MB and the upload seems to work. I guess there is a different size limit for those?
@tonanhngo there are two handlers in our code, bodyparser
and multer
. the audio file is handled by multer. they have different limitation.
I think usually audio file should be bigger then corpus file. 1 or 2 MB text file should be pretty big already.
We can split up the text file into multiple corpus and upload them individually. We just need to document the size limit so the user knows how to split the file. The limit for the audio file is 100MB (Watson API limit). I guess 2MB is reasonable for the text corpus.
okay, let's use 2MB for corpus. Let me also check if we put limit for the audio file
When preparing the corpus data, we tell the user to issue the following command:
sed -f fixup.sed Documents/*.txt > corpus-1.input
But when trying to upload the corpus file in the UI, it only allows
txt
files.