Infini-AI-Lab / Sequoia

scalable and robust tree-based speculative decoding algorithm
282 stars 29 forks source link

Reproducibility: the tree_search generates too small tree #13

Open KexinFeng opened 3 months ago

KexinFeng commented 3 months ago

Hi,

I was trying to reproduce the numbers in the paper, but with the demo-config.json, plus the acceptance vector in the repo or the acceptance vector I tested myself, the generated trees are all very small and somewaht fixed:

0 _ 1 _ 3
 \_ 2

or

0 _ 1 _2 _3

But on the other hand, the growmaps in the two folders are generally very large, typically of size 128, 64, 32. Do you know what the possible reason is that the tree I generated is small and how to reproduce the growmaps in those two folders?

Thank you!

dreaming-panda commented 3 months ago

Can you send me your demo-config.json?
PS: the original demo-config.json is just a demo. You need to modify the content to generate a tree you want.

dreaming-panda commented 3 months ago

{ "acceptance_rate_vector": "acceptance-rate-vector.pt", "max_depth": 15, "max_budget": 128, "draft_time": 0.0003, "valid_budget": [1, 2, 4, 8, 16, 32, 64, 128], "target_time":[0.025, 0.025, 0.025, 0.025, 0.025, 0.027, 0.030, 0.035], "dst": "demo_tree.pt" }

p = [0.0000, 0.4803, 0.1104, 0.0576, 0.0373, 0.0265, 0.0211, 0.0170, 0.0135, 0.0113, 0.0093, 0.0087, 0.0075, 0.0067, 0.0058, 0.0061, 0.0049] might be a proper example to generate a tree of size of 32.

Some explanation: draft_time is the time for one draft model's forward pass. target_time is the time for one draft model's forward pass corresponding to the valid budget.

KexinFeng commented 3 months ago

Thanks for the fast reply and the explanation! The demo-config.json is the same as that in the repo.

{
    "acceptance_rate_vector": "acceptance-rate-vector.pt",
    "max_depth": 10,
    "max_budget": 128,
    "draft_time": 0.38,
    "valid_budget": [1, 2, 4, 8, 16, 32, 64],
    "target_time":[10, 10, 10, 12, 14, 18, 27],
    "dst": "demo_tree.pt"
}

The time above was assumed to be in the unit of ms.

acceptance_rate_vector: tensor([0.0000, 0.6342, 0.1079, 0.0570, 0.0225, 0.0195, 0.0150, 0.0045, 0.0030, 0.0120, 0.0045, 0.0075, 0.0045, 0.0060, 0.0030, 0.0015, 0.0030, 0.0015, 0.0030, 0.0000, 0.0030, 0.0030, 0.0030, 0.0000, 0.0015, 0.0000, 0.0015, 0.0000, 0.0000, 0.0000, 0.0015, 0.0015, 0.0015, 0.0735]) which is similar to the acceptance vector in the repo too.

I can first try your demo-config.json and the acceptance_vec above too. It seems that our acceptance vec sizes are also different. About the times in the config, I previously assume that the unit in the numbers are not important; i.e. if we simultaneously scale the draft_time and the target_time by the same multiplicity, the resultant tree is invariant. I don't know if this is the correct assumption.

KexinFeng commented 3 months ago

Here is some updates. I tried your config.json with my original acc_rate_vec.pt, the generated tree becomes normal, of size 32.

From the ablation test, it looks like the target time [10, 10, 10, 12, 14, 18, 27] (unit ms) I used is the key reason that caused the optimal tree to be of size 4. It looks like the algo is pretty sensitive to the target_time profile. In contrast to the above target_time profile, [ 11, 11, 11, 11, 11, 20, 31](unit ms) generate tree size 16. The two are not quite different, yet the resultant tree sizes vary a lot.

By the way, is it true that the generated tree sizes can only be numbers from "valid_budget": [1, 2, 4, 8, 16, 32, 64] ? And to allow for more tree size numbers, in the config.json, the data points of more valid_budgets have to be provided there?

dreaming-panda commented 3 months ago

Yes, the generated tree sizes can only be numbers from "valid_budget".

singularity-s0 commented 3 months ago

How to determine the optimal depth and budget? Can you share the config.json that generated the example A100 and L40 growmaps in this repository?

singularity-s0 commented 3 months ago

Some explanation: draft_time is the time for one draft model's forward pass. target_time is the time for one draft model's forward pass corresponding to the valid budget.

Is there a typo here? Shouldn't target_time be the time for a verification pass?

briskerkazoos commented 2 months ago

Some explanation: draft_time is the time for one draft model's forward pass. target_time is the time for one draft model's forward pass corresponding to the valid budget.

Is there a typo here? Shouldn't target_time be the time for a verification pass?

I have the same question.

"draft_time": 0.0003

What is the unit of this value? I don't think any model can complete the forward pass within 0.0003 seconds?