parser.add_argument(
"--total-token",
type=int,
default=60,
help="The maximum number of new generated tokens.",
)
parser.add_argument(
"--depth",
type=int,
default=5,
help="The maximum number of new generated tokens.",
)
I noticed the actual depth is args.depth+1, and actual total-token is args.total_token - 1.
The actual setting is as mentioned in the paper. The total-token in the code does not include the one token generated by the target model, so it is reduced by one.
I noticed the actual depth is args.depth+1, and actual total-token is args.total_token - 1.
refer to this part of the code:
Could you please show me your settings for different sizes of the model?