The main fix here is in the direction that character span instances are padded from. That fix made the model actually train correctly, and now it looks like performance is within a few percent of the original BiDAF implementation. There are also some other more minor things.
The main fix here is in the direction that character span instances are padded from. That fix made the model actually train correctly, and now it looks like performance is within a few percent of the original BiDAF implementation. There are also some other more minor things.