Training Issues - Githubissues

Hi, sorry for the late reply.

Yes, I did have this error one or two times, and they mainly arose from the size of the labels causing problems. The 6000_tag_labels.csv is over 1GB, so if one doesn't be a bit careful about how they load it into their training code it can cause problems. For example, if you are using the same fastai method that I used for training, you will likely require quite a lot of RAM (>20GB) to load the data in.

This is because (after many crashes on my side during training) I realized internally fastai attempts to add all the labels into a python set datatype (to find all the unique labels), and doing this for a 1.1GB csv file with 6000 unique tags is super memory intensive it seems. Luckily once it does this initial loading, the RAM usage drops quite a bit. So I found that my RAM usage spiked really high when first loading it into fastai, and then became lower and stable during training.

In short: watch your RAM usage (not GPU memory) as you load the csv into your training program, since if you run out that might be what is causing random segfault errors. In which case my suggestion is to obtain more RAM. However, it could be something very different -- the RAM issue is just what caused my segfault errors.

RF5 / danbooru-pretrained

Training Issues #1