chengchingwen / Transformers.jl

Julia Implementation of Transformer models
MIT License
526 stars 75 forks source link

Thanks. Not really an issue, but a question. #3

Closed mboedigh closed 5 years ago

mboedigh commented 5 years ago

I ran across this using the new Julia Repository Search engine, but only after I built my own (on Julia Discourse and GitHub). I'm having some trouble getting mine to converge on anything but the "copy" task. Yours does converge on my "stutter" task, which is a derivative of copy, except with select tokens duplicated. I've been using your code and can't find anything that explains the difference. However, there was one thing that I did not understand. Is it necessary (and why) to create a custom gradient for the Word Embedding? I simply indexed a TrackedArray. Is that a problem? Thanks for the Transformer, though. It seems very well done.

chengchingwen commented 5 years ago

Hi, thanks for the interest. So there is a custom gradient because I implement the gather function. It’s just for convenience and solving some performance issue (though I haven’t benchmark it yet). There should be no difference between using gather or getindex in the result.

Btw, I will make a package announcement once I reach v0.1.0

chengchingwen commented 5 years ago

@mboedigh Hey, I will open a model-zoo repo for Transformers.jl. Would you like to add some examples like the stutter task or anything else you have tested?

mboedigh commented 5 years ago

Sure. I don’t see the zoo yet, but let me know when it is there.