--data flag is telling me its an invalid value because its a directory?

dookiethedog commented 7 months ago

Describe the bug When using my run command: python train.py --outdir C:\Users\User\Documents\machinelearning\6\styleganfunresults --cfg=stylegan2 --data C:\Users\User\Documents\machinelearning\6\styleganfunganimages --gamma=1 --snap=3 --metrics=none --mbstd-group=20 --gpus=1 --batch=20

I get this error:

Usage: train.py [OPTIONS]
Try 'train.py --help' for help.

Error: Invalid value for '--data': File 'C:\Users\User\Documents\machinelearning\6\styleganfunganimages' is a directory.

(styleganfun) C:\Users\User\Documents\machinelearning\stylegan3-fun>

To Reproduce

I have 2k png images with transparent backgrounds and used the dataset_tool.py first with the below command.

python dataset_tool.py --source C:\Users\User\Documents\machinelearning\5\512croppedCopy --dest C:\Users\User\Documents\machinelearning\6\styleganfunganimages then i tried to train on that data with

python train.py --outdir C:\Users\User\Documents\machinelearning\6\styleganfunresults --cfg=stylegan2 --data C:\Users\User\Documents\machinelearning\6\styleganfunganimages --gamma=1 --snap=3 --metrics=none --mbstd-group=20 --gpus=1 --batch=20

and received that above error?

Expected behavior Obviously it should just accept that being a directory? not sure why it wouldn't be a directory? even the flags in the train.py file says it should be a directory

Screenshots

Desktop (please complete the following information):

OS: Windows 10
Python 3.8,
CUDA toolkit 11.1
NVIDIA grpahics driver 551.23
GPU [ASUS rog strix RTX 3090]

dookiethedog commented 7 months ago

I also tried in the train.py file to change the line

@click.option('--data', help='Training data', metavar='[ZIP|DIR]', type=click.Path(exists=True, dir_okay=False), required=True)

back to stylegan3's original

type=click.Path(exists=True, dir_okay=False), required=True)

but that gives me the error

Error: --data: Path must point to a directory or zip

This is pretty funny because one way its saying its an error because its a directory, and the other line is saying its not pointing at a directory? what is going on here?

dookiethedog commented 7 months ago

I only wanted to use this fork as it supports png's with transparent backgrounds. I just decided to dig into your code and change my augment.py, training_loop, and dataset_tool.py in the original stylegan3 gitclone to the changes you made to allow transparent backgrounds and all works perfectly. Am still unsure what is wrong with this git's code tho where it doesn't like the data directory? weird.

PDillis commented 7 months ago

Sorry, I hadn't seen this issue before. I'd like to fix the issue, but since the RGBA data I tested with worked fine, I think you can give me one or two samples of your data for me to test with so I can figure out exactly what's going on.

As a first quick check: when running dataset_tool.py, did you try setting --dest=C:\Users\User\Documents\machinelearning\6\styleganfunganimages.zip? Perhaps the creation of the folder fails, but not for the .zip file, but this is just a quick check you can do to see if it works. As the docs mention, it's easier to use ZIP files as you can move them more efficiently, but a folder should also work.

dookiethedog commented 7 months ago

Sorry, I hadn't seen this issue before. I'd like to fix the issue, but since the RGBA data I tested with worked fine, I think you can give me one or two samples of your data for me to test with so I can figure out exactly what's going on.

As a first quick check: when running dataset_tool.py, did you try setting --dest=C:\Users\User\Documents\machinelearning\6\styleganfunganimages.zip? Perhaps the creation of the folder fails, but not for the .zip file, but this is just a quick check you can do to see if it works. As the docs mention, it's easier to use ZIP files as you can move them more efficiently, but a folder should also work.

yeah .zip file works great, thanks for the suggestion i suppose i overlooked this step just assuming everything would be the same. Just another question do you ever plan on adding AUG_PROB to this repo? I am noticing augmentation leaks on lower datasets and have tried to remove colour augs will helps stabilizes it a lot, however, rotational augmentations when the augment becomes strong i find starts to leak into my dataset. I read that some other people implemented aug_prob in their code to avoid this as not every round will be trained with augmented images. I tried to implement this myself but after some testing I assume I failed due to training instability at around 800kmg+.

I suppose I should add i attempted to do this by modifying the loss.py file where images are augmented, and would decide if a round would be augmented or not in the training loop and skip ada adjust if it was false ofc as well. I am a dev myself but am very new to gans and ML am wanting to outreach to others to learn how to better train my data to prevent aug leakage. if you have any suggestions that would be amazing

Edit: after some more testing I noticed my code works for aug_prob but only when the augmentation is fixed and is showing very good results on my limited datasets (1-2k). Am not sure why it doesn't work for non fixed tho, i believe it has to do with the adjust function in the training_loop as this is ignored on aug fixed.

PDillis commented 7 months ago

With a low amount of data you usually see some leakage, so what you can do is either: transfer learn from another model (e.g. don't randomly initialize the networks but start from a good model at the resolution of your data like, --cfg=stylegan2 --resume=ffhq512) and/or lower the learning rate for the networks (lower the preset values for --glr and --dlr), and/or increasing the --gamma value. Since you have an RGBD dataset, there's no pre-trained model to start from, UNLESS some crazy stuff is done like starting the RGB channels of your model from one of the pre-trained models in torch_utils/gen_utils.py (resume_specs). I'd recommend starting with lowering the learning rate, setting --gamma=100. I think there are more things you can try, but these should work for now.

Regarding the fixed prob working and not ADA, I think it's weird but maybe the logs in the tensorboard could tell you more (like it's not actually reacting/adapting and thus not approaching the target prob). My fix of only doing the augmentations in RGB (line 366 in training/augment.py) could not really work while using RGBD data and low amount of data. Anyways, since a fixed prob works, you could also use the reference by the paper authors in the Supplementary Material:

Basically for --aug=fixed, if you only do one type of augmentation like --augpipe=blit (blue line on the top left plot), then you should set --p=0.4 as per the figure above. Now, I know that --p=0.8 has a lower FID, but it's a high augmentation and will likely lead to a faster leakage. 0.4 has also a low FID compared to its neighbor values, hence my suggestion. On the other hand, if you use more than one augmentation like --augpipe=bgc (green line in lower left plot), then you can either see --p=0.2 or --p=0.4, you can test these.

Since you're starting in GANs, well these are hard to tune, so be prepared for lots of experiments, but you'll quickly know what to change later on when you get new data/try new models :)

PDillis / stylegan3-fun

--data flag is telling me its an invalid value because its a directory? #37