Closed mboedigh closed 5 years ago
Hi, thanks for the interest. So there is a custom gradient because I implement the gather
function. It’s just for convenience and solving some performance issue (though I haven’t benchmark it yet). There should be no difference between using gather
or getindex
in the result.
Btw, I will make a package announcement once I reach v0.1.0
@mboedigh Hey, I will open a model-zoo repo for Transformers.jl. Would you like to add some examples like the stutter task or anything else you have tested?
Sure. I don’t see the zoo yet, but let me know when it is there.
I ran across this using the new Julia Repository Search engine, but only after I built my own (on Julia Discourse and GitHub). I'm having some trouble getting mine to converge on anything but the "copy" task. Yours does converge on my "stutter" task, which is a derivative of copy, except with select tokens duplicated. I've been using your code and can't find anything that explains the difference. However, there was one thing that I did not understand. Is it necessary (and why) to create a custom gradient for the Word Embedding? I simply indexed a TrackedArray. Is that a problem? Thanks for the Transformer, though. It seems very well done.