Closed jeetsh4h closed 6 months ago
Hi @jeetsh4h,
Thank you for reporting the issue. The error you encountered is due to the train sampler setting. For a regression task, the train sampler should be set to random instead of weighted.
Additionally, there is another potential issue: the loss function is set to CrossEntropyLoss, which is not appropriate for regression tasks. You should use a regression-specific loss function such as 'MSELoss'.
And, all the codes used to generate the results in the README file are released under the benchmark folder. Please clone the repository and check it out for more details.
Regarding your request for documentation, I understand the need for more comprehensive documentation. Currently, I'm busy and unable to work on it right away, but I will prioritize this when I have time.
If you have any further questions, feel free to reach out.
Thank you so much! I encountered another Out of Index error
IndexError Traceback (most recent call last)
in <cell line: 8>() 6 category_cols=category_cols 7 ) ----> 8 valid_ds = SwitchTabDataset( 9 X=X_val, config=config, 10 Y=y_val.values, continuous_cols=continuous_cols, 1 frames /usr/local/lib/python3.10/dist-packages/ts3l/utils/switchtab_utils/data_utils.py in
(.0) 74 75 class_weights = [num_samples/class_counts[i] for i in range(len(class_counts))] ---> 76 self.weights = [class_weights[self.label[i]] for i in range(int(numsamples))] 77 else: 78 self.weights = [1.0 for in range(len(X))] IndexError: list index out of range
I think the reason this is happening is because SwitchTabDataset assumes that the class labels start from 0 and linearly increase. During the pre-processing of the wine dataset on my end, the class labels ended up being 1, 2, 3 - which led to this error showing up over and over again.
The code for the same:
# y_val looks like this:
# [1, 3, 1, 2, 2, 2, 1, 3, 3, 1, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 3, 1,
# 3, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 3, 1, 3]
valid_ds = SwitchTabDataset(
X=X_val, config=config,
Y=y_val.values, continuous_cols=continuous_cols,
category_cols=category_cols
)
In the PyTorch community, class labels are typically expected to start from 0. For instance, CrossEntropyLoss assumes that the range of given labels is [0, C) where C is the number of classes. Given this standard, I don't think there should be an issue.
I was ignorant of the pyTorch standard. Thank you for your help!
The error that I have received is as follows:
I am using the abalone dataset from UCI to test the SwitchTab model. It would be really useful if you could release the code snippets that were used to generate the results mentioned in the README file that compares all the models with each other. Better yet, a way to set up a docs website where we can help add more documentation to all the APIs, more specifically for the classes present in utils.