Closed halimkun closed 1 month ago
this my data from .csv file
cat,anorexia,muntah,lemah,kurang respon,dehidrasi,demam,diare,hipersevalis,radang telinga,batuk,hidung meler,gatal,telinga keropeng,pilek,bersin2,mata berair,disease
k1,ya,ya,ya,ya,,,,,,,,,,,,,panleukopenia
k90,,,,,,,,,,,,ya,,,,,scabies
k224,,,,,,,ya,,,,,,,,,,enteritis
k235,,,,,,,,,,ya,,,,ya,ya,,fcv
Hey @halimkun it sounds like you might need a custom transformation. You can create a separate category to represent the absence of a symptom (such as "no") and impute that wherever there's an empty feature value. Since yes
and no
are a binary representations, you can use the integers 1 and 0 for their numeric representation. Having that said, why do you need categories represented numerically?
https://docs.rubixml.com/latest/preprocessing.html#custom-transformations
hello, I have a question. this there are some questions.
i have data that looks like this. (cat disease data by identifying the symptoms of the disease) because there is too much data, I can't display all of them, only some of them
it can be seen that there is an empty space that does not represent the existing symptoms.
the question is how to change
yes
and theempty space
to numericI have tried changing it using
NumericStringConverter()
nothing happens (data is still the same), and usingOneHotEncoder()
there is an addition of data in each index, for example index 0 which originally had 17 data turned into 29 databelow is the data that has been
apply()
withOneHotEncoder()
I don't think it's a problem, however when trying to predict new data
Generates error
Fatal error: Uncaught Rubix\ML\Exceptions\IncorrectDatasetDimensionality: Dataset must contain samples with exactly 29 dimensions, 19 given