dharaneishvc / ArogyaSathi

ArogyaSathi: Empowering Healthcare with Predictive Analytics
2 stars 1 forks source link

ValueError: Error: Input data is not in a valid format. Please confirm that the input data is scikit-learn compatible. For example, the features must be a 2-D array and target labels must be a 1-D array. #1

Open sAnju3888 opened 7 months ago

sAnju3888 commented 7 months ago

from tpot import TPOTClassifier from sklearn.preprocessing import LabelEncoder def disease_prediction(X_train, y_train, X_test, y_test, X_Pred, multi_class=False):

# Convert target labels to numeric values if needed
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)
print("C1",y_train_encoded)
print("C2",y_test_encoded)
# Initialize TPOT with desired parameters
tpot = TPOTClassifier(generations=2, population_size=30, verbosity=2, n_jobs=-1, random_state=42)
# Fit TPOT on training data
tpot.fit(X_train, y_train_encoded)
print(y_pred_proba)
# Predict probabilities
if multi_class:
    y_pred_proba = tpot.predict_proba(X_test)
else:
    y_pred_proba = tpot.predict_proba(X_test)[:, 1]

#Optionally, you can also access the best pipeline found by TPOT
print("Best pipeline:", tpot.fitted_pipeline_)
print("Best score:", tpot.score(X_test, y_test_encoded))

#Predict Input and print report
pred_report(np.unique(y_train).tolist(), tpot.predict_proba(X_Pred)[0])

# Print classification report, ExAI prediction
display_classification_metrics(y_test, y_pred_proba, multi_class=multi_class)
explain_model(tpot, X_Pred, X_train, X_train.columns.tolist(), np.unique(y_train).tolist(), multi_class=multi_class)

augmented_df = augment_data(resampled) data['dataset'] = augmented_df augmented_df = augmented_df.drop(columns = [label_column])

Split Train, Test dataset to determine Metrics and then predict the input aswell and give ExAI for that

X_train, X_test, y_train, y_test = train_test_split(augmented_df, data['dataset'][label_column]) disease_prediction(X_train, y_train, X_test, y_test, X_Pred, multi_class=data['multi_class'])

issue1

full code:https://github.com/dharaneishvc/ArogyaSathi/blob/main/final_code_new.ipynb

sAnju3888 commented 7 months ago

Fixed it :) in pre-processing Categorical column had Other values than available one if (0,1) was value in all column then value was found to 0.2