Bryce1010 / DeepLearning-Project

7 stars 5 forks source link

Pytorch #5

Open Bryce1010 opened 4 years ago

Bryce1010 commented 4 years ago

资源

Bryce1010 commented 4 years ago

Tools

Bryce1010 commented 4 years ago

collate_fn() https://discuss.pytorch.org/t/how-to-create-a-dataloader-with-variable-size-input/8278/2 By default, torch stacks the input image to from a tensor of size NCH*W, so every image in the batch must have the same height and width. In order to load a batch with variable size input image, we have to use our own collate_fn which is used to pack a batch of images.

For image classification, the input to collate_fn is a list of with size batch_size. Each element is a tuple where the first element is the input image(a torch.FloatTensor) and the second element is the image label which is simply an int. Because the samples in a batch have different size, we can store these samples in a list ans store the corresponding labels in torch.LongTensor. Then we put the image list and the label tensor into a list and return the result.

here is a very simple snippet to demonstrate how to write a custom collate_fn:

import torch
from torch.utils.data import DataLoader
from torchvision import transforms
import torchvision.datasets as datasets
import matplotlib.pyplot as plt

# a simple custom collate function, just to show the idea
def my_collate(batch):
    data = [item[0] for item in batch]
    target = [item[1] for item in batch]
    target = torch.LongTensor(target)
    return [data, target]

def show_image_batch(img_list, title=None):
    num = len(img_list)
    fig = plt.figure()
    for i in range(num):
        ax = fig.add_subplot(1, num, i+1)
        ax.imshow(img_list[i].numpy().transpose([1,2,0]))
        ax.set_title(title[i])

    plt.show()

#  do not do randomCrop to show that the custom collate_fn can handle images of different size
train_transforms = transforms.Compose([transforms.Scale(size = 224),
                                       transforms.ToTensor(),
                                       ])

# change root to valid dir in your system, see ImageFolder documentation for more info
train_dataset = datasets.ImageFolder(root="/hd1/jdhao/toyset",
                                     transform=train_transforms)

trainset = DataLoader(dataset=train_dataset,
                      batch_size=4,
                      shuffle=True,
                      collate_fn=my_collate, # use custom collate function here
                      pin_memory=True)

trainiter = iter(trainset)
imgs, labels = trainiter.next()

# print(type(imgs), type(labels))
show_image_batch(imgs, title=[train_dataset.classes[x] for x in labels])
Bryce1010 commented 4 years ago

PyTorch 提速 [github]

预处理提速

IO提速

训练策略

推理加速

Bryce1010 commented 4 years ago

a list of torch -> torch

torch.cat([x,x],0)
torch.stack(a)

tensor -> PIL Image

img=transforms.ToPILImage(mode='')(img).convert()

torch

torch.nn

torch.nn

​ torch.nn.Module

torch.nn.Parameter

class Parameter(torch.Tensor):
"""
A kind of Tensor that is to be considered a module parameter.
Arguments:  
    data(Tensor): parameter tensor.  
    requires_grad(bool,optional)
"""
    def __new__(cls, data=None, requires_grad=True):
    def __deepcopy__(self, memo): # 重写deepcopy()
    def __repr__(self): 

torch.nn.Module

所有网络模型的基类

torch.optim

torch.optim是实现了不同的优化算法的python包,除了支持大量经常使用的算法外,结构也很丰富,所以可以轻易地接入后续的开发升级。

optimizer的参数首先需要一个可迭代的参数对象,然后可以个性化的设置lrweight_decay,等参数;

如果你在使用GPU运算,还需要使用optimizer.cuda()将优化器放到显存中,保证优化的参数和优化器在统一的位置。

torch.optim.Optimizer(params, defaults)

torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)

如何调整学习率? torch.optim.lr_scheduler

Pretrained models for Pytorch

rep:

model_name = 'nasnetalarge' # could be fbresnet152 or inceptionresnetv2
model = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
model.eval()

pretrained model API

Dataset

chainer.dataset.DatasetMixin

[url]

DatasetMixin provides the __getitem__() operator. The default implementation uses get_example() to extract each example, and combines the results into a list. This mixin makes it easy to implement a new dataset that does not support efficient slicing.

Ignite

[Homepage] [blog] [documens]

[Learning and testing neural networks on PyTorch using Ignite]

Ignite 如何工作?

Engine

通过Engine循环提供数据和执行进程函数;

例如执行一个监督学习任务:

训练过程

def update_model(trainer,batch):
    model.train()
    optimizer.zero_grad()
    x,y=prepare_batch.next(batch)
    y_pred=model(x)
    loss=loss_fn(y_pred,y)
    loss.backward()
    optimizer.step()
    return loss.item()
trainer=Engine(update_model)
trainer.run(data,max_epoch=100)

测试过程

from ignite.engine import Engine

total_loss = []
def compute_metrics(_, batch):
    x, y = batch
    model.eval()
    with torch.no_grad():
        y_pred = model(x)
        loss = criterion(y_pred, y)
        total_loss.append(loss.item())

    return loss.item()

evaluator = Engine(compute_metrics)
evaluator.run(data, max_epochs=1)
print(f”Loss: {torch.tensor(total_loss).mean()}”)

Events & Handlers

Ignite的Events & Handlers设计结构如下所示:

fire_event(Events.STARTED)

while epoch < max_epochs:
    fire_event(Events.EPOCH_STARTED)
    # run once on data
    for batch in data:
        fire_event(Events.ITERATION_STARTED)

        output = process_function(batch)

        fire_event(Events.ITERATION_COMPLETED)
    fire_event(Events.EPOCH_COMPLETED)
fire_event(Events.COMPLETED)

如何调用Events & Handlers?

train_loader = …
model = …
optimizer = …
criterion = ...
lr_scheduler = …

def process_function(engine, batch):
    # … user function to update model weights

trainer = Engine(process_function)

@trainer.on(Events.STARTED)
def setup_logging_folder(_):
    # create a folder for the run
    # set up some run dependent variables

@trainer.on(Events.ITERATION_COMPLETED)
def update_lr(engine):
    lr_scheduler.step()

trainer.run(train_loader, max_epochs=50)

# --------------------------

trainer=Engine(update_model)
trainer.add_event_hander(
    Events.STARTED,
    lambda engine:print("strating training")
)
# or
@trainer.on(Events.STARTED)
def on_training_started(engine):
    print("Another message of start training")

# attach handler with args,kwargs
mydata=[1,2,3,4]
def on_train_ended(engine,data):
    print("Training is ended.mydata={}".format(data))

trainer.add_event_handler(
    Events.COMPLETED,on_training_ended,my_data
)

Built-in evetns filtering

Out-of-box features

Visualisation loggers

supports Tensorboard, Visdom, MLflow, Polyaxo

ignite_logging.png

Metrics

Ignite also provides a list of out-of-the-box metrics for various tasks: Precision, Recall, Accuracy, Confusion Matrix, IoU etc, ~20 regression metrics

from ignite.metrics import Accuracy

def compute_predictions(_, batch):
    # …
    return y_pred, y_true

evaluator = Engine(compute_predictions)
metric = Accuracy()
metric.attach(evaluator, "val_accuracy")
evaluator.run(val_loader)
> evaluator.state.metrics[“val_accuracy”] = 0.98765

precision = Precision(average=False)
recall = Recall(average=False)
F1_per_class = (precision * recall * 2 / (precision + recall))
F1_mean = F1_per_class.mean()  # torch mean method
F1_mean.attach(engine, "F1")

Go here and here to see the full list of available metrics.

Optimizer’s parameter scheduling