analogdevicesinc / ai8x-training

Model Training for ADI's MAX78000 and MAX78002 Edge AI Devices
Apache License 2.0
89 stars 80 forks source link

Building custom dataloaders #200

Closed oussemajelassi closed 1 year ago

oussemajelassi commented 1 year ago

I am looking into building a custom dataloader, I am following this documentation : Data Loader Design for MAX78000 Model Training

coming to this part : 'Expected Data Range For training, input data is expected to be in the range . When evaluating quantized weights, or when running on hardware, input data is instead expected to be in the native MAX7800X range of [-128, +127].

As described in the following sections, the data loader function takes the data path and some arguments as input arguments. The arguments field includes two required fields, act_mode_8bit and truncate_testset. When set to True, the first argument refers to the case normalization should be done correctly for the native MAX7800X range, i.e., to range [-128, +127]. When set to False, the normalization should be in the range of for training. '

1/ I was not able to understand the role of truncate_testset.

2/ Dataloaders will take part in training phase and eval phase how can i change that argument for each phase.

3/ After building my own dataloader what is the next step in order to make my own model trained and deployed on MAX78000

Thanks in advance, A LOT !

seldauyanik-maxim commented 1 year ago

Dear Oussama

Here are my comments:

1) truncate_testset is used to get only the first image from the test set:

When set, the test_dataset data loader returns will only have a single item. This can be handy for testing model output for single item in a resource-constrained environment and for debugging purposes. Noting that the AISegment implementation is revised for clarifying that part out further by PR 203

2) train.py handles automatic argument settings for each phase. But if you have a custom training code/notebook you should set them accordingly.

Example data loader creation for training using distiller.apputils:

    simulate = False
    data_path = '[some_path here]'
    args = Args(act_mode_8bit=simulate)
    batch_size = 16
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    train_loader, val_loader, test_loader, _ =
        apputils.get_data_loaders(aisegment.AISegment80_get_datasets,
                                  (data_path, args), batch_size=16, cpu=device == 'cpu')

For the evaluation mode, you need to set simulate parameter to True. Also, get_data_loaders has some additional parameters you can take a look into and set according to your needs.

3) For the next steps in order to make your own model trained and deployed on MAX78000, I would suggest going through the related Readme parts:

oussemajelassi commented 1 year ago

Thank you.

oussemajelassi commented 1 year ago

Hello again In Your examples in dataset folder I see no actual creating of a dataloader, do train.py handles that in an automatic way ? I mean there is no call for train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True) for example.

seldauyanik-maxim commented 1 year ago

Train.py creates data loader using --dataset argument, for details see