Closed Young1993 closed 2 years ago
Hi, thanks for the interest!
You are right about independent decoding. Let's say there are 8 domains and each domain has 6 slots. Then for a single turn, the model needs to run predictions 8 * 6 = 48 times. This is, of course, totally parallel (no order is required)
In this example, the model is not just copying from the inputs. Our hypothesis is that the model can leverage descriptions and knows that this is a time-related value to predict.
OK, THX!
Hi,
I think this is a very interesting work and I have two questions want to check:
Looking forward to your reply.
Best