Open mrT23 opened 4 years ago
How are you making your databunch?
` def get_y_fun(input): im_name = os.path.basename(input) # 'staffordshire_bull_terrier_54.jpg' class_name = im_name[:imname.rfind('')] return class_name
pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(),
get_y=get_y_fun)
dbunch = pets.databunch(untar_data(URLs.PETS) / "images", item_tfms=Resize(args.input_size),
batch_tfms=aug_transforms(), num_workers=args.num_workers)
`
btw, the regexpr
pat = r'/([^/]+)_\d+.jpg$'
is not cross platform
I did some testing. this happens in windows, but not in linux.
its a dataloader issue, nowhere in the datalodaer the targets are casted directly to int64
Are you sure you are using PyTorch 1.3? The type-promotion should get rid of those errors. On Linux, I can do x==y
with a tensor of type Int and a tensor of type Long.
The first error should be fixed now btw.
@sgugger, thanks or the feedback.
Are you sure you are using PyTorch 1.3? The type-promotion should get rid of those errors. On Linux, I can do
x==y
with a tensor of type Int and a tensor of type Long.
i upgraded my pytorch to 1.3.1 (requirements are currently 1.2.0). the first problem remains. i will pull the latest version of fastaiV2 with your commit that fixes the label smoothing,
maybe it is better to explicitly convert the targets to int64 in the collate function? it is a common practice in some repositories, for examples https://github.com/NVIDIA/apex/blob/master/examples/imagenet/main_amp.py
def fast_collate(batch): imgs = [img[0] for img in batch] targets = torch.tensor([target[1] for target in batch], dtype=torch.int64)
We don't want to automatically convert to int64 tensors for users because it takes twice the space in GPU memory and sometimes they don't need the int64.
I can't reproduce the error on windows with PyTorch 1.3.1. When asking for the accuracy between a tensor of type Int (int32) and a tensor of type Long (int64), I don't have any error.
i am trying to run simple PETS dataset training.
when i use the following simple learner with label smooth:
learn = cnn_learner(dbunch, resnet34, metrics=[accuracy, top_k_accuracy], loss_func=LabelSmoothingCrossEntropy()) learn.fit_one_cycle(4)
i get the following error immediately when the training starts:
if i use CrossEntropyLossFlat loss instead, i get an error at the validation phase:
Thanks