Lung cancer is one of the most aggressive and fatal cancers globally, with non-small cell lung cancer (NSCLC) being the most common type. Over the years, treatment strategies have evolved significantly. Keytruda (Pembrolizumab) is an immunotherapy drug designed to target the PD-1/PD-L1 pathway, a mechanism that cancer cells exploit to evade detection by the immune system. By blocking this pathway, Keytruda enables the immune system to recognize and attack cancer cells more effectively.
Keytruda has shown great promise in treating various cancers, especially in cases where high levels of PD-L1 expression are present. However, it is not always clear which patients will benefit the most from this treatment, as it is based on a combination of biomarkers and clinical factors.
The goal of this project is to assist doctors in making more informed decisions on whether Keytruda is a suitable treatment option for a given lung cancer patient. By using machine learning, we aim to build a predictive model that will analyze patient data and assess their likelihood of responding positively to Keytruda treatment.
This decision support system will not only help doctors make quicker decisions based on data-driven insights, but it will also provide patients with a personalized treatment plan, improving outcomes and reducing unnecessary treatments.
This project will follow a structured methodology to develop a robust machine learning model and deploy it for real-world use:
The dataset contains detailed information about 3,000 lung cancer patients, including their age, gender, cancer stage, tumor size, genetic mutations, smoking history, and their response to previous treatments.
Shape of Dataset:
Columns:
Patient_ID
: Unique identifier for each patient.Age
: Age of the patient.Gender
: Male/Female.Cancer_Stage
: Stage of lung cancer (Stage II, Stage III, Stage IV).PD-L1_Level
: Percentage of PD-L1 expression in the tumor cells.EGFR_Mutation_Count
: Number of mutations in the EGFR gene.Tumor_Size
: Size of the tumor in centimeters.Treatment_Type
: Type of treatment received (Chemotherapy or Immunotherapy).Smoking_Status
: Smoking history of the patient (Never, Former, Current).Response
: Patient's response to treatment (Responder/Non-Responder).We will start by loading the dataset into a database system such as MySQL or from a .csv
file. This will help in managing larger datasets and efficiently querying patient records.
pandas
, sqlalchemy
for data manipulation and database connection.Once the data is loaded, we will perform several data cleaning steps:
Smoking_Status
, Gender
, and Tumor_Size
.Age
and Tumor_Size
.Treatment_Type
and Smoking_Status
.In this step, we will gain insights into the dataset by visualizing:
We'll create new meaningful features by combining existing columns or performing transformations. Examples include:
Response
(0 for Non-Responder, 1 for Responder).Age
and Tumor_Size
into categories (e.g., small/medium/large tumor).We will experiment with several machine learning models to predict treatment response, including:
Each model will be trained on the patient data, and hyperparameters will be tuned to optimize performance. We will use cross-validation techniques to evaluate model accuracy, precision, recall, and F1 score.
After training and tuning, the model's performance will be evaluated using standard metrics:
Finally, we will deploy this predictive model as a web application using Flask, allowing doctors to enter patient details and receive a prediction on whether Keytruda will be an effective treatment.
This project combines the power of data science with the growing field of precision medicine to assist in the treatment of lung cancer. By building a predictive model based on patient data, we aim to enhance clinical decision-making and improve patient care in cancer treatment.