VincentAlcazer / AIPAL

Artificial Intelligence-based Prediction of Acute Leukemia: a free and open-source software package built in R, with a user-friendly interface provided via Shiny, that enables clinical hematologists and biologists to diagnose the three main subtypes of acute leukemia based solely on 10 routine biological parameters.
https://alcazerv.shinyapps.io/AIPAL/
Other
6 stars 1 forks source link

Validation Pipeline #1

Open Wizzzard93 opened 7 months ago

Wizzzard93 commented 7 months ago

Hi,

I want to validate your model with in-house data. I tried to port your model to Python, but I got slightly different risk scores. Can you provide a validation pipeline in R?

Alternatively, am I missing a preprocessing step? `` import xgboost as xgb import numpy as np

age = 53 MCV_fL = 88 MCHC_g_L = 330 PT = 50 WBC_G_L = 10 Lymphocytes_G_L = 3 Monocytes_G_L = 6 Platelets_G_L = 6 fibri_gL = 6 LDH_UI_L = 250

mono_percent = (Monocytes_G_L*100)/ WBC_G_L

Sample data with 10 features

sample_data = np.array([[fibri_gL, MCV_fL, mono_percent, LDH_UI_L, PT, MCHC_g_L, Lymphocytes_G_L, age, Monocytes_G_L, Platelets_G_L,]]) # Example data

Convert the sample data to DMatrix

dtest = xgb.DMatrix(sample_data)

Make the prediction with probability estimates

prediction = model.predict(dtest) ``

BR Merlin

VincentAlcazer commented 7 months ago

Hey,

Thank you for your interest in our work.

The model is fully available in R on the repository. In addition to the raw predictions scores, optimal and confident cutoffs were set to guide clinical decisions.

If you want to learn more about how these cutoffs were set, the article is currently in press and should be available soon.

Best,

Vincent Alcazer

Wizzzard93 commented 7 months ago

Hi,

thanks for the response, I am excited to read the article. Would you mind sharing your validation pipeline in R? I have a dataset prepared and I would like to see if I can achive similar performance :)

BR Merlin

VincentAlcazer commented 5 months ago

Dear Merlin,

The paper is now available on https://www.thelancet.com/journals/landig/article/PIIS2589-7500(24)00044-X/fulltext All the R pipeline with the used cutoffs are available on the github repository

Please let me know if you have any issue running this, I would be very interested to have the results on you cohort

Best,

Vincent