Open awa121 opened 1 year ago
I'm using your code, but I found maybe you can add “class number check of y and y_test” while splitting the dataset. Because in my dataset there are many classes, in some folds, after split the y and y_test may get different class numbers, which is due to a Bug. Here is the corresponding line for you.
In detail, my data is long tail classification, there are only 2 samples in some classes.
Thanks for the feedback! Please note that this package is still in a beta phase and subject to frequent changes. (I should probably make a disclaimer about that ...)
In the line you mention the splitting tries to stratify for y. If you have too few samples to split, this might also mess with the CV parameter search (pypsupertime/parameter_search.py). I cannot do much more but add an error or warning, if the dataset cannot be stratified.
In the meantime, you could try duplicating the samples from the class with only two samples and used weights (sample_weight) to balance the training. Also you could use estimator_params={"early_stopping": False}
to avoid splitting off an early stopping set. If you do so, consider setting max_iter
to a lower number to avoid long runtimes. SGD should converge rather quickly.
Thanks for the feedback! Please note that this package is still in a beta phase and subject to frequent changes. (I should probably make a disclaimer about that ...)
In the line you mention the splitting tries to stratify for y. If you have too few samples to split, this might also mess with the CV parameter search (pypsupertime/parameter_search.py). I cannot do much more but add an error or warning, if the dataset cannot be stratified.
In the meantime, you could try duplicating the samples from the class with only two samples and used weights (sample_weight) to balance the training. Also you could use
estimator_params={"early_stopping": False}
to avoid splitting off an early stopping set. If you do so, consider settingmax_iter
to a lower number to avoid long runtimes. SGD should converge rather quickly.
Thanks for your reply. Could you upload a example.h5ad file? I cannot find the "/path/to/data_sce.h5ad" in this project. Thank you so much.
Thanks for your reply, there seems a bug: no definition of y_test and y_train. https://github.com/claassenlab/pyPsupertime/blob/main/src/pypsupertime/plots.py#L104C48-L104C48
I'm using your code, but I found maybe you can add “class number check of y and y_test” while splitting the dataset. Because in my dataset there are many classes, in some folds, after split the y and y_test may get different class numbers, which is due to a Bug. Here is the corresponding line for you.
https://github.com/claassenlab/pyPsupertime/blob/237c74f5058d4b59b02f4bf72f29b174b6e397f4/src/pypsupertime/model.py#L359C13-L359C26