Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.37k stars 1.22k forks source link

Cannot compile on win10, 16gb ram, out of memory #1140

Open hackedpassword opened 1 month ago

hackedpassword commented 1 month ago

Hi. I've been trying everything I know to free up as much mem as possible. Starting from 3.5gb available, compiling hits the 16gb ceiling and fails.

I had tried using the setup.py script, also modded the compile args, seen below, to hopefully maybe free more ram. Still failed.

Am I out of luck?

           extra_compile_args={
               "cxx": [
                   "-O1",  # Reduced optimization to decrease memory usage
                   "-std=c++17"
               ] + generator_flag,

               "nvcc": append_nvcc_threads(
                   [
                       "-O1",  # Reduced optimization for CUDA code
                       "-std=c++17",
                       "--use_fast_math",
                   ]
                   + generator_flag
                   + cc_flag
               ),
           },
tridao commented 1 month ago

Did you try MAX_JOBS=1? We can compile this with github runners with 16GB for linux, though idk about windows.

hackedpassword commented 1 month ago

Did you try MAX_JOBS=1? We can compile this with github runners with 16GB for linux, though idk about windows.

I did, should have mentioned. My first run, not understanding the implication of MAX_JOBS I used (cpu_threads - 1), for 7. Hit the ceiling like installing a trampoline in the living room.