Open BlinkDL opened 1 year ago
@BlinkDL try with this fix now!
For the KeOps issue - can you share details about your environment? PyTorch, CUDA, and KeOps versions would all be helpful.
I believe the issue happens when the GPU runs out of mem for the larger benchmarks.... e.g. on my setup with pykeops-2.1.1 on Driver Version: 525.60.13 CUDA Version: 12.0 with torch 2.0.0a0+git81b5eff on a 24gb card it crashes:
Number of parameters: 1326096384
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffe4e0c556b in range_preprocess_from_device(int&, int, int, int, int**, int, int*&, int*&, int*&, int*&, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, int*) () from .cache/keops2.1.1/build/pykeops_nvrtc.cpython-310-x86_64-linux-gnu.so
but works if i do a smaller test:
Number of parameters: 12102144
[KeOps] Generating code for formula Sum_Reduction(ComplexMult(ComplexMult(Var(1,2,0),Var(0,2,1)),ComplexExp(ComplexMult(Var(2,2,0),Var(3,2,1)))),0) ... OK
hope that helps!
getting this same error -- also on a 24gb card. i see the --ckpt option but is there some way to toggle the model architecture between the different model sizes (ex. 1.3B vs. 2.7B)
thanks!
There are examples to switch between the different models for text generation: https://github.com/HazyResearch/H3/tree/main/examples
Hi there. It's great to see another LM trained on the Pile.
When I run benchmarks/benchmark_generation.py:
and it exits after "Segmentation fault".
So I uninstall pykeops, and then the new error is: