Closed saifhassan closed 1 year ago
@XzwHan Finally, I have solved above error but now getting following error:
0it [01:10, ?it/s]
ERROR - main.py - 2023-04-17 15:09:19,326 - Traceback (most recent call last):
File "main.py", line 299, in main
y_majority_vote_accuracy_all_steps_list = runner.test_image_task()
File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 1030, in test_image_task
p_sample_loop_with_eval(model, x_tile, y_0_hat_tile, y_T_mean_tile, self.num_timesteps,
File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 660, in p_sample_loop_with_eval
optional_metric_compute(y_0, num_t)
File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 644, in optional_metric_compute
compute_and_store_cls_metrics(config, y_labels_batch, cur_y, batch_size, num_t)
File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 565, in compute_and_store_cls_metrics
CI_y_pred = raw_prob_val.nanquantile(q=torch.tensor([low / 100, high / 100]),
AttributeError: 'Tensor' object has no attribute 'nanquantile'
Please help in this.
@XzwHan Finally, I have solved above error but now getting following error:
0it [01:10, ?it/s] ERROR - main.py - 2023-04-17 15:09:19,326 - Traceback (most recent call last): File "main.py", line 299, in main y_majority_vote_accuracy_all_steps_list = runner.test_image_task() File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 1030, in test_image_task p_sample_loop_with_eval(model, x_tile, y_0_hat_tile, y_T_mean_tile, self.num_timesteps, File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 660, in p_sample_loop_with_eval optional_metric_compute(y_0, num_t) File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 644, in optional_metric_compute compute_and_store_cls_metrics(config, y_labels_batch, cur_y, batch_size, num_t) File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 565, in compute_and_store_cls_metrics CI_y_pred = raw_prob_val.nanquantile(q=torch.tensor([low / 100, high / 100]), AttributeError: 'Tensor' object has no attribute 'nanquantile'
Please help in this.
Hi @saifhassan, can you take a look at your Pytorch version? "nanquantile" method is implemented after 1.7, before that it did not exist in the library. Please let us know if upgrading Pytorch helps so we can better identify the issue to help you out.
@XzwHan Finally, I have solved above error but now getting following error:
0it [01:10, ?it/s] ERROR - main.py - 2023-04-17 15:09:19,326 - Traceback (most recent call last): File "main.py", line 299, in main y_majority_vote_accuracy_all_steps_list = runner.test_image_task() File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 1030, in test_image_task p_sample_loop_with_eval(model, x_tile, y_0_hat_tile, y_T_mean_tile, self.num_timesteps, File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 660, in p_sample_loop_with_eval optional_metric_compute(y_0, num_t) File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 644, in optional_metric_compute compute_and_store_cls_metrics(config, y_labels_batch, cur_y, batch_size, num_t) File "/home/user1/Documents/research@saif/CARD/classification/card_classification.py", line 565, in compute_and_store_cls_metrics CI_y_pred = raw_prob_val.nanquantile(q=torch.tensor([low / 100, high / 100]), AttributeError: 'Tensor' object has no attribute 'nanquantile'
Please help in this.
Hi @saifhassan, can you take a look at your Pytorch version? "nanquantile" method is implemented after 1.7, before that it did not exist in the library. Please let us know if upgrading Pytorch helps so we can better identify the issue to help you out.
yeah, Thanks @JegZheng . this worked. My mistake. I mistakenly downgraded torch. this error solved but another error occurs while training on my custom image classification dataset:
ERROR
INFO - main.py - 2023-04-17 23:20:23,677 - Using device: cuda:0
INFO - main.py - 2023-04-17 23:20:23,678 - Writing log file to ./results/card_onehot_conditional_results/1000steps/nn/run_1/f_phi_prior_cat_f_phi/f_phi_supervised/logs/fer2013/split_0
INFO - main.py - 2023-04-17 23:20:23,678 - Exp instance id = 233284
INFO - main.py - 2023-04-17 23:20:23,678 - Exp comment =
1it [00:01, 1.45s/it]
INFO - card_classification.py - 2023-04-17 23:20:30,038 -
Before training, the guidance classifier accuracy on the test set is 0.45238095.
INFO - card_classification.py - 2023-04-17 23:20:31,628 - epoch: 0, guidance auxiliary classifier pre-training loss: 3.1490981578826904
INFO - card_classification.py - 2023-04-17 23:20:31,628 -
Pre-training of guidance auxiliary classifier took 0.0265 minutes.
2it [00:00, 15.01it/s]
INFO - card_classification.py - 2023-04-17 23:20:32,054 -
After pre-training, guidance classifier accuracy on the training set is 0.45238095.
1it [00:00, 13.12it/s]
INFO - card_classification.py - 2023-04-17 23:20:32,243 -
After pre-training, guidance classifier accuracy on the test set is 0.35714286.
INFO - card_classification.py - 2023-04-17 23:20:32,606 - epoch: 0, step: 1, CE loss: 0, Noise Estimation loss: 1.4856054782867432, data time: 0.14027762413024902
INFO - card_classification.py - 2023-04-17 23:20:34,690 - epoch: 0, step: 2, CE loss: 0, Noise Estimation loss: 1.5408426523208618, data time: 0.07044851779937744
INFO - card_classification.py - 2023-04-17 23:20:42,814 - Update best accuracy at Epoch 0.
INFO - card_classification.py - 2023-04-17 23:20:43,829 - epoch: 0, step: 2, Average accuracy: 28.571428298950195, Max accuracy: 28.57%
INFO - card_classification.py - 2023-04-17 23:20:44,241 - epoch: 1, step: 4, CE loss: 0, Noise Estimation loss: 1.375126600265503, data time: 0.08911991119384766
INFO - card_classification.py - 2023-04-17 23:20:44,653 - epoch: 2, step: 6, CE loss: 0, Noise Estimation loss: 1.6988937854766846, data time: 0.08784031867980957
INFO - card_classification.py - 2023-04-17 23:20:52,547 - Update best accuracy at Epoch 2.
INFO - card_classification.py - 2023-04-17 23:21:06,486 - epoch: 2, step: 6, Average accuracy: 30.952381134033203, Max accuracy: 30.95%
INFO - main.py - 2023-04-17 23:21:15,487 -
Training procedure finished. It took 0.7946 minutes.
INFO - main.py - 2023-04-17 23:21:16,466 - Using device: cuda:0
INFO - main.py - 2023-04-17 23:21:16,466 - Writing log file to ./results/card_onehot_conditional_results/1000steps/nn/run_1/f_phi_prior_cat_f_phi/f_phi_supervised/logs/fer2013/split_0
INFO - main.py - 2023-04-17 23:21:16,466 - Exp instance id = 233955
INFO - main.py - 2023-04-17 23:21:16,466 - Exp comment =
INFO - card_classification.py - 2023-04-17 23:21:21,791 - Loading from: ./results/card_onehot_conditional_results/1000steps/nn/run_1/f_phi_prior_cat_f_phi/f_phi_supervised/logs/fer2013/split_0/ckpt_last.pth
1it [00:01, 1.45s/it]
INFO - card_classification.py - 2023-04-17 23:21:24,200 - After training, guidance classifier accuracy on the test set is 0.33333333.
INFO - card_classification.py - 2023-04-17 23:21:24,200 -
We pick samples at timestep t=0 to compute evaluation metrics.
INFO - card_classification.py - 2023-04-17 23:21:24,200 - Begin generating 5 samples for tuning temperature scaling parameter...
0it [00:00, ?it/s]INFO - card_classification.py - 2023-04-17 23:21:50,392 - Minibatch 0 sampling took 25.7538 seconds.
1it [00:26, 26.08s/it]INFO - card_classification.py - 2023-04-17 23:22:02,697 - Minibatch 1 sampling took 12.0565 seconds.
2it [00:38, 19.21s/it]
2 torch.Size([64, 3])
INFO - card_classification.py - 2023-04-17 23:22:02,735 - Begin optimizing temperature scaling parameter...
2it [00:00, 14.55it/s]
INFO - card_classification.py - 2023-04-17 23:22:02,985 - NLL of the last mini-batch: nan
INFO - card_classification.py - 2023-04-17 23:22:02,986 - Apply tuned temperature scaling parameter T with a value of nan
0it [04:37, ?it/s]
ERROR - main.py - 2023-04-17 23:26:40,566 - Traceback (most recent call last):
File "main.py", line 297, in main
y_majority_vote_accuracy_all_steps_list = runner.test_image_task()
File "/home/user1/Documents/CARD/classification/card_classification.py", line 1024, in test_image_task
p_sample_loop_with_eval(model, x_tile, y_0_hat_tile, y_T_mean_tile, self.num_timesteps,
File "/home/user1/Documents/CARD/classification/card_classification.py", line 654, in p_sample_loop_with_eval
optional_metric_compute(y_0, num_t)
File "/home/user1/Documents/CARD/classification/card_classification.py", line 638, in optional_metric_compute
compute_and_store_cls_metrics(config, y_labels_batch, cur_y, batch_size, num_t)
File "/home/user1/Documents/CARD/classification/card_classification.py", line 588, in compute_and_store_cls_metrics
fig = sm.qqplot(gen_y_2_class_prob_diff[instance_idx, :],
File "/home/user1/anaconda3/envs/card_original/lib/python3.8/site-packages/statsmodels/graphics/gofplots.py", line 684, in qqplot
probplot = ProbPlot(
File "/home/user1/anaconda3/envs/card_original/lib/python3.8/site-packages/statsmodels/graphics/gofplots.py", line 223, in __init__
self.fit_params = dist.fit(data)
File "/home/user1/anaconda3/envs/card_original/lib/python3.8/site-packages/scipy/stats/_continuous_distns.py", line 62, in wrapper
return fun(self, *args, **kwds)
File "/home/user1/anaconda3/envs/card_original/lib/python3.8/site-packages/scipy/stats/_continuous_distns.py", line 363, in fit
raise RuntimeError("The data contains non-finite values.")
RuntimeError: The data contains non-finite values.
Changes I have done for loading dataset I am running on other dataset (having 3 classes) so I have made following changes:
Change 1 (utils.py) --> Define class for dataset loading instead of torchvision.Datasets.CIFAR10 (etc)
class CustomeDataset(Dataset):
def __init__(self, root_dir, transform=None):
self.root_dir = root_dir
self.transform = transform
# Get the list of all image files in the dataset
self.image_files = []
for label in os.listdir(self.root_dir):
label_dir = os.path.join(self.root_dir, label)
for filename in os.listdir(label_dir):
if filename.endswith('.jpg'):
self.image_files.append((os.path.join(label_dir, filename), label))
def __len__(self):
return len(self.image_files)
def __getitem__(self, idx):
# Load the image and label from disk
image_path, label = self.image_files[idx]
image = Image.open(image_path).convert('RGB')
# Apply any data transformations
if self.transform:
image = self.transform(image)
return image, int(label)
Change 2 (in utils.py) --> Reading dataset
elif config.data.dataset == "customdataset":
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
# Create a dataset object
train_dataset = CustomDataset(root_dir=config.data.dataroot+'/train', transform=transform)
test_dataset = CustomDataset(root_dir=config.data.dataroot+'/val', transform=transform)
Change 3 --> Add proper config and run_custom.sh script
@JegZheng @mingyuanzhou @XzwHan Above error also occurs while training on CIFAR10 or MNIST. Please guide
Hi @saifhassan, thank you for your patience. I'm not familiar with the dataset you are using, but my speculations for the issue are:
encoder_x
network here within the diffusion model, which extracts the image embedding from the image input — as you can see from the model.py
file, we've experimented with various choices of network architecture for different benchmark datasets, but it's quite likely that a more suitable architecture is to be further explored to determine for your dataset.Due to our limited bandwidth, we are unable to engage in detailed discussions about customized datasets or modifications to our codebase at this time. We encourage you to consult with your colleagues who also have access to the dataset you are working on, as their input can help lighten your workload and contribute to more efficient progress. We sincerely wish you the best with your project.
Hey @XzwHan @mingyuanzhou @JegZheng,
Your CARD model is totally paradigm shift, amazing work. I have implemented and tried both classification on the cifar (all versions) and mnist datasets and produced same results as in your paper.
However I am training classification model on other benchmark dataset having 7 classes and each image is of 48x48. I prepared .yml file and script.sh file. However when I run .sh script, and designed my own class for loading dataset as you're doing with torchvision.datasets.CIFAR10 and etc but I still get following error:
Thanks in advance.