Closed daMichaelB closed 2 years ago
Hey @daMichaelB Thanks for reporting this! This is something we do now have support for internally but don't yet expose to the user. All labels / num classes etc. for classification problems are handled by a TargetFormatter
object (see the API references here: https://lightning-flash.readthedocs.io/en/latest/api/data.html#flash-core-data-utilities-classification ).
These objects are usually inferred from the training data, but in cases where that inference is not possible (e.g. where can't efficiently get a list of all targets) we have begun to expose this object. So you could have for example:
datamodule = ImageClassificationData.from_data_frame(
...,
target_formatter = MultiLabelTargetFormatter(labels=["label_1", ..., "label_n"]),
)
Would this API work for you? If so, I can get to work on adding the target_formatter
argument to all of our from_*
methods :smiley:
Hey @ethanwharris . This would solve a lot of trouble on my side π ! I think that would be a great feature for dealing with imbalanced datasets!
Thank you for the suggestion and let me know if i can help with testing it!
Thank you for the great support and implementation π
β Questions and Help
What is your question?
I have a highly imbalanced dataset, where some minority classes are very rare. I put them ONLY into the validation set. I want to validate, if the model can classify them not to be in the majority class.
The Datamodule was created with:
As i understood, i can create the
ImageClassifier
with the number of ALL classes:However my training crashes at the beginning with
Validation sanity check:
. Tracelog:I found that Label 14 is in the validation set but not in the training set.
Question
Is there a way to train on a subset of the classes but validate on all classes ?
What have you tried?
I have no idea how to workaround this...
What's your environment?