Freezed at line 148 : caching Text Encoder outputs... train_util.py:1143

zneb076 commented 1 month ago

I have got freeze at line 148 : caching Text Encoder outputs... train_util.py:1143

my gpu 3060/12gb ram64 train image : 13 step:1560

how to fix this thank you

20240916_155229

zneb076 commented 1 month ago

my task manager during it freeze

20240917_065348

wuliang19869312 commented 1 month ago

ENOENT: no such file or directory, stat 'd:\pinokio\api\fluxgym.git{{input.event[0]}}'

Traceback (most recent call last): File "d:\pinokio\api\fluxgym.git\app.py", line 4, in import gradio as gr File "d:\pinokio\api\fluxgym.git\env\Lib\site-packages\gradio__init.py", line 3, in
import gradio._simple_templates File "d:\pinokio\api\fluxgym.git\env\Lib\site-packages\gradio_simple_templates\init.py", line 1, in from .simpledropdown import SimpleDropdown File "d:\pinokio\api\fluxgym.git\env\Lib\site-packages\gradio_simple_templates\simpledropdown.py", l ine 6, in from gradio.components.base import Component, FormComponent File "d:\pinokio\api\fluxgym.git\env\Lib\site-packages\gradio\components\init.py", line 1, in <mo dule> from gradio.components.annotated_image import AnnotatedImage File "d:\pinokio\api\fluxgym.git\env\Lib\site-packages\gradio\components\annotated_image.py", line 8, in import numpy as np File "d:\pinokio\api\fluxgym.git\env\Lib\site-packages\numpy\init__.py", line 149, in
raise ImportError(msg) from e ImportError: Error importing numpy: you should not try to import numpy from its source directory; please exit the numpy source tree, and relaunch your python interpreter from there.

After the update either installing it yourself or installing it via pinokio gives an error!

e15whr commented 1 month ago

I'm getting the same problem.

Just seems to stop working at: [INFO] INFO caching Text Encoder outputs... train_util.py:1143

kenzthum commented 1 month ago

same issue here, any of you find the way how to solve it?

zneb076 commented 1 month ago

still cant solve 😭

6Morpheus6 commented 1 month ago

@kenzthum @e15whr Which GPU, how much VRAM and RAM do you have? How long did you wait at this step?

6Morpheus6 commented 1 month ago

@wuliang19869312 Which OS do you have? First time I see this error.

zneb076 commented 1 month ago

@kenzthum @e15whr Which GPU, how much VRAM and RAM do you have? How long did you wait at this step?

my gpu 3060/12gb ram64 train image : 13 step:1560

wait for 8hrs.

and my OS windows10

6Morpheus6 commented 1 month ago

@zneb076 Sadly I'm not really good at python. But this is the code fluxgym is trying to execute when you get stuck.

 # iterate batches
        logger.info("caching Text Encoder outputs...")
        for batch in tqdm(batches, smoothing=1, total=len(batches)):
            # cache_batch_latents(vae, cache_to_disk, batch, subset.flip_aug, subset.alpha_mask, subset.random_crop)
            caching_strategy.cache_batch_outputs(tokenize_strategy, models, text_encoding_strategy, batch)

Since it is working for the majority of users, we can exclude a script error. Since the whole training was working, it's not plausible that you are missing a python module or got a wrong version. What's left is a problem related to your cache, batch and your model.

How much space is left on your drive?
Did you change the location of the cache folder in the installation configuration?
Can you post screenshots of fluxgym with your settings, part of your images and captions?
Are both clip text encoders present at .\fluxgym.git\models\clip and do they have the right size?

zneb076 commented 1 month ago

@6Morpheus6 Thank you so mush for help.

How much space is left on your drive?

My C: drive have 97GB free (but fluxgym install in ssd F: drive 300gb free)

Did you change the location of the cache folder in the installation configuration?

No , I did not change.

Can you post screenshots of fluxgym with your settings, part of your images and captions?

Are both clip text encoders present at .\fluxgym.git\models\clip and do they have the right size?

Yes , it same of your

6Morpheus6 commented 1 month ago

@zneb076 All looks good in my opinion. The only thing that might cause issues are spaces in your image names. I don't know about Fluxgym, but for other apps I know this can lead to problems in some cases. If you still tried to reinstall fluxgym and still got this error, please rename your files so they don't contain spaces or special characters and try again. Please let me know if this fixes that problem.

zneb076 commented 1 month ago

@6Morpheus6 thank you so much will try again today 🙏

6Morpheus6 commented 1 month ago

@zneb076 Probably we finally found the issue on Pinokio Discord Please update your NVIDIA drivers to the latest version. After that, open NVIDIA System control and make sure that CUDA - Sysmem Fallback Policy is set to Driver Default or Enabled

zneb076 commented 1 month ago

@6Morpheus6 ohh thank you so much

zneb076 commented 1 month ago

@6Morpheus6

It work Sir , Thank you soooo much 🙏🙏🙏

UserSpy commented 1 month ago

I had the same issue with my 3060/12gb, and updating the Nvidia drivers fixed it for me too. There is no CUDA - Sysmem Fallback Policy on the version I had before updating.

I also have spaces in the image file names, which are not causing any problems so far.

cocktailpeanut / fluxgym

Freezed at line 148 : caching Text Encoder outputs... train_util.py:1143 #95