Open dribnet opened 3 years ago
Is there an easy change to instead more lightly fine tune an existing model on my dataset?
I've managed to fine-tune an existing model with these steps:
<taming-transformers repo root>/logs/<some name>/configs
and <taming-transformers repo root>/logs/<some name>/checkpoints
last.ckpt
file into newly created checkpoints
directorymodel.yaml
file into <some name>-project.yaml
and put it into configs
directory<some name>-project.yaml
file. Don't forget to adapt some values like you did when training a model from scratch
data:
target: main.DataModuleFromConfig
params:
batch_size: 5
num_workers: 8
train:
target: taming.data.custom.CustomTrain
params:
training_images_list_file: some/training.txt
size: 256
validation:
target: taming.data.custom.CustomTest
params:
test_images_list_file: some/test.txt
size: 256
python -m pytorch_lightning.utilities.upgrade_checkpoint --file logs/<some name>/checkpoints/last.ckpt
python main.py -t True --gpus <gpus> --resume logs/<some name>
and the training proccess should be started :) Thanks heaps @mrapplexz - this is indeed working well for me. So far I'm surprised how powerful even 100 iterations of fine tuning is (I'll probably tweak the learning rate down, etc.) but this recipe was hugely helpful getting me unblocked!
@mrapplexz @dribnet hi, Thank you for your amazing ideas, but there are some points confused me. When resuming the model, how to set the training steps? e.g., , I have 1M images.
And I have another question as showed issues/93, If use different dataset (e.g., medical Image dataset) to finetune the method, the parameter disc_start = 0
showed in https://heibox.uni-heidelberg.de/d/8088892a516d4e3baf92/ maybe not a good choice. But I am stilling training the model, so it's just a consumption.
Is there an easy change to instead more lightly fine tune an existing model on my dataset?
I've managed to fine-tune an existing model with these steps:
- Download existing weights and config (e. g. https://heibox.uni-heidelberg.de/d/8088892a516d4e3baf92/)
- Create directories
<taming-transformers repo root>/logs/<some name>/configs
and<taming-transformers repo root>/logs/<some name>/checkpoints
- Put downloaded
last.ckpt
file into newly createdcheckpoints
directory- Rename downloaded
model.yaml
file into<some name>-project.yaml
and put it intoconfigs
directory- Add these lines to the end of
<some name>-project.yaml
file. Don't forget to adapt some values like you did when training a model from scratchdata: target: main.DataModuleFromConfig params: batch_size: 5 num_workers: 8 train: target: taming.data.custom.CustomTrain params: training_images_list_file: some/training.txt size: 256 validation: target: taming.data.custom.CustomTest params: test_images_list_file: some/test.txt size: 256
- Run
python -m pytorch_lightning.utilities.upgrade_checkpoint --file logs/<some name>/checkpoints/last.ckpt
- Run
python main.py -t True --gpus <gpus> --resume logs/<some name>
and the training proccess should be started :)
Hello, thank you very much for your answer. It has been very helpful to me. I used Python - m pytorch lighting. utilities. upgrade_checkpoint -- file logs/must_finish/vq_f8_16384/checkpoints/last.ckpt After this command, CUDA error: out of memory is displayed, which confuses me. I am using the. ckpt file you linked to
TL;DR: custom training is great! is there a good config or way to debug quality of result on small-ish datasets?
I've managed to train my own custom models using the excellent additions provided by @rom1504 in #54 and have hooked this up to clip + vqgan back propagation successfully. However so far the samples from my models are a bit glitchy. For example, with a custom dataset of images such as the following:
I'm only able to get a sample that looks something like this:
Or similarly when I train on a dataset of sketches and images like these:
My clip + vqgan back propagation of "spider" with that model turns out like this:
So there is evidence that the model is picking up some gross information such as color distributions, but the results are far from what I would expect using a simpler model such as SyleGan on the same dataset.
So my questions: