Base code for SpeechModel

< summary >

both mic and audio now generate same shape of input; dropped silence at the beginning of audio file
part of layer definitions for SpeechModel has been implemented with tensorflow.js defined in SpeechModel.js
base code for SpeechResModel.js
implemented predict() for both models

< Things to Note >

In layer definitions of SpeechModel.js, corresponding pytorch code has been copied and left as comment for future reference.
config handling is not yet implemented. (currently using CNN_TRAD_POOL2 config for SpeechModel and RES8 config for SpeechResModel)
ChannelLast (NHWC) data format is used because it is default for all layers.
dnn layer for SpeechModel is not implemented yet
implemented train function for testing compilation and execution of the model created; to be removed
tensorflow.js generates error message WebGL: INVALID_ENUM: readPixels: invalid type. However, this seems to be irrelevant with the functionality and known to be expected (https://github.com/tensorflow/tfjs/issues/199)

castorini / honkling