Closed ryukinix closed 4 months ago
Este imagem mostra comparacao entre 56 datasets sobre cancer. Estão no site https://lce.biohpc.swmed.edu/lungcancer/dataset.php. O maior dataset tem 576 entradas. Sobre integração, além de sexo e idade, apenas a pergunta do hábito de fumar é recorrente. Imagino que para mapear o tipo de dano no tecido
Que imagem interessante, @helen0l !
Interessante, mas achei as features destes datasets não usuais. Não sei se isso vai atender o cliente que é clinico geral. Por exemplo EGFR Status, KRAS Status, R Stage, T Stage, etc ....etc. aquelas ultimas features, acho que não vão servir para muita coisa não.
Exatamente. Minha conclusão é que, os datasets que usamos são os que mais se enquadram ao problema proposto, em termos de features. Pesquisei bastante também. Indo para datasets mais específicos , encontramos o cenário acima que são: datasets com poucas entradas , parâmetros específicos e não integráveis. Pensei numa estratégia para integração de dados.
Minha proposta é integrar com dados de covid . Tem bases com perguntas parecidas. Criamos o assumption de que covid e câncer são mutuamente exclusivos. Depois podemos fazer um oversampling das pessoas saudáveis.
https://www.kaggle.com/datasets/iamhungundji/covid19-symptoms-checker
https://datascience.stackexchange.com/questions/52627/why-class-weight-is-outperforming-oversampling
Com a inclusão do 3o dataset com as seguintes perguntas:
1 1 What is your sex? 1 2 What is your age? 1 3 Do you smoke? (Options: Yes or No) 1 4 Do you have yellow fingers? (Options: Yes or No) 1 5 Do you experience anxiety? (Options: Yes or No) 1 6 Are you influenced by peer pressure to smoke? (Options: Yes or No) 1 7 Do you have any chronic diseases? (Options: Yes or No) 1 8 Do you experience fatigue? (Options: Yes or No) 1 9 Do you have any allergies? (Options: Yes or No) 1 10 Do you experience wheezing? (Options: Yes or No) 1 11 Do you consume alcohol? (Options: Yes or No) 1 12 Do you experience coughing? (Options: Yes or No) 1 13 Do you experience shortness of breath? (Options: Yes or No) 1 14 Do you have difficulty swallowing? (Options: Yes or No) 1 15 Do you experience chest pain? (Options: Yes or No) 1 16 Have you been diagnosed with lung cancer? (Options: Yes or No) 2 1 What is your age? 2 2 What is your gender? (Options: Male, Female) 2 3 What is the level of air pollution exposure you experience? (Options: 1 (Low) - 8 (High)) 2 4 What is your level of alcohol use? (Options: 1 (None) - 8 (High)) 2 5 What is the level of dust allergy you have? (Options: 1 (None) - 8 (High)) 2 6 What is the level of occupational hazards you are exposed to? (Options: 1 (None) - 8 (High)) 2 7 What is your level of genetic risk? (Options: 1 (None) - 8 (High)) 2 8 What is your level of chronic lung disease? (Options: 1 (None) - 8 (High)) 2 9 What is your level of balanced diet? (Options: 1 (None) - 8 (High)) 2 10 What is your level of obesity? (Options: 1 (None) - 8 (High)) 2 11 What is your level of smoking? (Options: 1 (None) - 8 (High)) 2 12 What is your level of exposure to passive smoking? (Options: 1 (None) - 8 (High)) 2 13 What is the level of chest pain you experience? (Options: 1 (None) - 8 (High)) 2 14 What is the level of coughing of blood you experience? (Options: 1 (None) - 8 (High)) 2 15 What is the level of fatigue you experience? (Options: 1 (None) - 8 (High)) 2 16 What is the level of weight loss you experience? (Options: 1 (None) - 8 (High)) 2 17 What is the level of shortness of breath you experience? (Options: 1 (None) - 8 (High)) 2 18 What is the level of wheezing you experience? (Options: 1 (None) - 8 (High)) 2 19 What is the level of swallowing difficulty you experience? (Options: 1 (None) - 8 (High)) 2 20 What is the level of clubbing of finger nails you experience? (Options: 1 (None) - 8 (High)) 3 1 Do you currently have a fever? 3 2 Are you experiencing any unusual tiredness or fatigue? 3 3 Have you been coughing lately? If so, is it dry or productive? 3 4 Are you having difficulty breathing? 3 5 Do you have a sore throat? 3 6 Have you experienced any pains or aches recently? 3 7 Are you experiencing nasal congestion? 3 8 Do you have a runny nose? 3 9 Have you experienced diarrhea recently? 3 10 Are you currently experiencing none of the symptoms mentioned? 3 11 How old are you? (Select one: 0-9 years old, 10-19 years old, 20-24 years old, 25-59 years old, 60+ years old) 3 12 What is your gender? (Select one: Female, Male, Transgender) 3 13 How would you describe the severity of your symptoms? (Select one: Mild, Moderate, Severe, None) 3 14 Have you had any contact with individuals who have symptoms similar to yours? (Select one: Yes, No, Don't know) 3 15 In which country are you currently located?
Grouping 1
Grouping 2
Grouping 3
O terceiro dataset tem 316800 entradas
+300mil? Minha nossa!
https://drive.google.com/file/d/1e7k2sP4SdFKb-dV1GFPBRCCmp3aH222S/view?usp=drivesdk
o dataset integrado, pra deixar de fácil acesso
Definition of done:
analysis
, assim como os outros (buscar normalizar acesso a datasets com dvc usando a libcancer_estimator_model.datasets
)