HazyResearch / H3

Language Modeling with the H3 State Space Model
Apache License 2.0
511 stars 53 forks source link

Correct method to load 2.7B? #10

Open BlinkDL opened 1 year ago

BlinkDL commented 1 year ago

Hi I can run 1.3B using benchmark code here, but 2.7B is still not working (bad results) with the following params:

parser = argparse.ArgumentParser(description='H3 generation benchmarking')
parser.add_argument('--dmodel', type=int, default=2560) # 2048
parser.add_argument('--nlayer', type=int, default=32) # 24
parser.add_argument('--attn-layer-idx', type=list, default=[8, 16, 24]) # [8, 16]
parser.add_argument('--nheads', type=int, default=20) # 16
parser.add_argument('--ckpt', type=str, default='/fsx/BlinkDL/CODE/_PUBLIC_/H3/H3-2.7B/model-3attn.pt')
parser.add_argument('--promptlen', type=int, default=1024)
parser.add_argument('--genlen', type=int, default=128)
args = parser.parse_args()
DanFu09 commented 1 year ago

We're looking into this, stay tuned!

tridao commented 1 year ago

Thanks for the bug report, we've just fixed this. There was a mistake in the mapping between old and new parameter names that we've now fixed.

BlinkDL commented 1 year ago

Great. How abt the configuration for 125M and 355M

DanFu09 commented 1 year ago

Here are examples about how to load all the models, and example outputs: https://github.com/HazyResearch/H3/blob/main/examples/README.md