Closed bgawrych closed 3 years ago
LGTM
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1565/d13d37d19e549bb13984e855b9f3e6cb24a4bbc6/index.html
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1565/ad9185846d06baf328878fa7b37a5356a6439c89/index.html
@szha Can you help with CI? I'm not sure why it's failing and don't know how to rerun it. website-build seems to fail on notebook I haven't edited
Hi @bgawrych, Could you try to merge with v0.10.x, we ported new CI settings from v0.x to v0.10.x. Thanks!
@barry-jin, @szha still issue with notebook, little strange as bert.md file was not changed for 1 year - should I fix it or it's CI issue?
@barry-jin I see the following error in the log, pointing out that there's a CI issue:
[2021-05-10T05:59:14.916Z] umount: /dev/shm: must be superuser to unmount.
[2021-05-10T05:59:14.917Z] mount: /dev/shm: permission denied.
[2021-05-10T05:59:14.917Z] ./gluon_nlp_job.sh: line 33: sudo: command not found
@barry-jin, @szha still issue with notebook, little strange as bert.md file was not changed for 1 year - should I fix it or it's CI issue?
Hi @bgawrych, we have ported the changes in bert.md from v0.x branch, you could try to merge with current v0.10.x.
@barry-jin I see the following error in the log, pointing out that there's a CI issue:
[2021-05-10T05:59:14.916Z] umount: /dev/shm: must be superuser to unmount. [2021-05-10T05:59:14.917Z] mount: /dev/shm: permission denied. [2021-05-10T05:59:14.917Z] ./gluon_nlp_job.sh: line 33: sudo: command not found
Thanks, I will fix this issue.
@barry-jin
[2021-05-19T09:19:56.267Z] Exception occurred:
[2021-05-19T09:19:56.267Z] File "/workspace/gluon-nlp/docs/conf.py", line 237, in setup
[2021-05-19T09:19:56.267Z] app.add_javascript('google_analytics.js')
[2021-05-19T09:19:56.267Z] AttributeError: 'Sphinx' object has no attribute 'add_javascript'
@barry-jin
[2021-05-19T09:19:56.267Z] Exception occurred: [2021-05-19T09:19:56.267Z] File "/workspace/gluon-nlp/docs/conf.py", line 237, in setup [2021-05-19T09:19:56.267Z] app.add_javascript('google_analytics.js') [2021-05-19T09:19:56.267Z] AttributeError: 'Sphinx' object has no attribute 'add_javascript'
Will be fixed in https://github.com/dmlc/gluon-nlp/pull/1575
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1565/1644e8fe25e66b05300042510ced0603d1dd4098/index.html
Merged. Thanks @bgawrych!
I'm suprised that the elimination of the 24x mask tensor creation gave you any speedup (as opposed to using masked softmax, which should) - MXNet already has common expression elimination pass (I wrote it: https://github.com/apache/incubator-mxnet/pull/15657). Does that not work for you?
@ptrendx I didn't know about this feature, but I wrote this small graph pass to test it:
#if MX_LIBRARY_VERSION <= 7
MXReturnValue TEST(const std::string& in_graph, const std::string** out_graph,
const std::unordered_map<std::string, std::string>& options,
const std::unordered_map<std::string, MXTensor>& args,
const std::unordered_map<std::string, MXTensor>& aux,
const PassResource& res) {
Graph *g = Graph::fromString(in_graph);
#else
MXReturnValue TEST(mxnet::ext::Graph *g,
const std::unordered_map<std::string, std::string>& options) {
#endif
Node* commonnode;
#if MX_LIBRARY_VERSION <= 7
for(Node* n : g->nodes) {
#else
for(int i=0; i < g->size(); i++) {
Node* n = g->getNode(i);
#endif
if (n->op.compare("softmax") == 0) {
commonnode = n->inputs[1].node;
break;
}
}
#if MX_LIBRARY_VERSION <= 7
for(Node* n : g->nodes) {
#else
for(int i=0; i < g->size(); i++) {
Node* n = g->getNode(i);
#endif
if (n->op.compare("softmax") == 0) {
n->inputs[1].node = commonnode;
}
}
#if MX_LIBRARY_VERSION <= 7
// convert back to JSON string from Graph/Node
*out_graph = new std::string(g->toString());
#endif
return MX_SUCCESS;
}
Overhead from these operators are negligible, but seems like it don't work in this case: with graph pass:
w/o
This PR adds graph pass to optimize CPU's softmax on BERT. Currently for BERT-large length tensor is created by following operations: expand_dims -> brodcast_axis -> Reshape and there is x24 such tensor creation. This pass replace softmax (with length) with regular softmax (but with masked input) - mask is created only once and then is passed to elemwise_sum to mask input. Applying pass in the scripts is optional
Original:
Masked softmax:
Thoughput in samples/s:
Accuracy:
There is also bug fix in interrleaved mha pass Accuracy without mha_interleave bug fix: {'exact_match': 79.62157048249763, 'f1': 87.75497143592598}