decile-team / cords

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.
https://cords.readthedocs.io/en/latest/
MIT License
316 stars 53 forks source link

Segmentation fault (core dumped) #72

Closed chengwuxinlin closed 2 years ago

chengwuxinlin commented 2 years ago

Hi,

I was trying to deploy CORDS selection to my training, but this error popped out Segmentation fault (core dumped).

I imitated code from https://github.com/decile-team/cords/blob/main/examples/SL/image_classification/python_notebooks/CORDS_SL_CIFAR10_Custom_Train.ipynb.

So basically I put my training and testing loader into GLISTERDataLoader, and switched this part into my code

for _, (inputs, targets, weights) in enumerate(dataloader): inputs = inputs.to(device) targets = targets.to(device, non_blocking=True) weights = weights.to(device) optimizer.zero_grad() outputs = model(inputs) losses = criterion_nored(outputs, targets) loss = torch.dot(losses, weights/(weights.sum())) loss.backward()

before modifying my code was running fine, so I believe there is an error inside the CORDS, my dataset is CIFAR10.

Thanks

krishnatejakk commented 2 years ago

@chengwuxinlin It is hard to know the error just from this. Can you paste the error snapshot or describe where the error is occurring in detail. Most probably an issue due to improper device initialization. And the training data loader only needs to be put in GlisterDataLoader format. There is no need for subset selection for the test dataset.

chengwuxinlin commented 2 years ago

I'm not sure what caused this issue. I installed what's in requirements.txt but it still told me Segmentation fault (core dumped). However, after I create a virtual environment and install the requirements.txt again. The issue was fixed. So I guess it conflicts with other packages?

Thanks for replying.