Fuenfgeld / ATeamDatenmanagementUndArchivierung

MIT License
7 stars 0 forks source link

SQL Sum coloums skip NULL #60

Closed fphans closed 3 years ago

fphans commented 3 years ago

`pat_exp = pd.read_sql_query(""" SELECT BASE_ENCOUNTER_COST, BASE_IMMUNIZATION_COST, BASE_PROCEDURE_COST, BASE_MEDICATION_COST, count_pat_immunizations.IMMUNIZATIONS, (CASE WHEN encounterclass LIKE '%ambulatory%' THEN 0 WHEN encounterclass LIKE '%wellness%' THEN 0 WHEN encounterclass LIKE '%emergency%' THEN 1 WHEN encounterclass LIKE '%inpatient%' THEN 0 WHEN encounterclass LIKE '%outpatient%' THEN 0 WHEN encounterclass LIKE '%urgentcare%' THEN 1
ELSE 0 END) AS ML_encounter_code, SUM ( CASE WHEAN BASE_ENCOUNTER_COST, 0) + COALESCE(BASE_PROCEDURE_COST, 0) + COALESCE(BASE_MEDICATION_COST, 0) ) , 0) AS total_cost, patients.Id FROM patients JOIN (SELECT patient_id, encounterclass, BASE_ENCOUNTER_COST, BASE_IMMUNIZATION_COST, BASE_PROCEDURE_COST, BASE_MEDICATION_COST, COUNT(DISTINCT id || dataset_origin || immunization_date || immunization_code) AS immunizations FROM encounters GROUP BY patient_id) AS count_pat_immunizations ON patients.id = count_pat_immunizations.patient_id """, conn_new) pat_exp.info()

bin immunization count to groups instead of dividing data into 'vaccinated' and 'not-vaccinated'

from sklearn.preprocessing import KBinsDiscretizer

extract values of 'immunizations' column

X = pat_exp['immunizations'].values

make sure that X is an array of shape [len(X)<, 1]

X = X.reshape(-1, 1)

set the parameters for the algorithm (see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer for more info)

imm_binning = KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='quantile')

apply the binning algorithm on the data and print the edges

X_binned = imm_binning.fit_transform(X) print("immunization_count bin_edges: ", imm_binning.binedges[0])

replace data in the dataframe (to have this colum ready for Machine Learning) and rename colums to unambiguous name

pat_exp['immunizations'] = X_binned pat_exp = pat_exp.rename(columns={'immunizations': 'imm_group'})`

@lknegendorf , kannst Du mir für diese Summe Hilfestellung geben, dann könnten wir für ypothese 2 zumindest ne correlation matrix bauen und ggf nochmal nach deinen immunization bins aufclustern

Grazie mille und gute nacht

lknegendorf commented 3 years ago

`pat_exp = pd.read_sql_query(""" SELECT BASE_ENCOUNTER_COST, BASE_IMMUNIZATION_COST, BASE_PROCEDURE_COST, BASE_MEDICATION_COST, count_pat_immunizations.IMMUNIZATIONS, (CASE WHEN encounterclass LIKE '%ambulatory%' THEN 0 WHEN encounterclass LIKE '%wellness%' THEN 0 WHEN encounterclass LIKE '%emergency%' THEN 1 WHEN encounterclass LIKE '%inpatient%' THEN 0 WHEN encounterclass LIKE '%outpatient%' THEN 0 WHEN encounterclass LIKE '%urgentcare%' THEN 1 ELSE 0 END) AS ML_encounter_code, SUM ( CASE WHEAN BASE_ENCOUNTER_COST, 0) + COALESCE(BASE_PROCEDURE_COST, 0) + COALESCE(BASE_MEDICATION_COST, 0) ) , 0) AS total_cost, patients.Id FROM patients JOIN (SELECT patient_id, encounterclass, BASE_ENCOUNTER_COST, BASE_IMMUNIZATION_COST, BASE_PROCEDURE_COST, BASE_MEDICATION_COST, COUNT(DISTINCT id || dataset_origin || immunization_date || immunization_code) AS immunizations FROM encounters GROUP BY patient_id) AS count_pat_immunizations ON patients.id = count_pat_immunizations.patient_id """, conn_new) pat_exp.info()

bin immunization count to groups instead of dividing data into 'vaccinated' and 'not-vaccinated'

from sklearn.preprocessing import KBinsDiscretizer

extract values of 'immunizations' column

X = pat_exp['immunizations'].values

make sure that X is an array of shape [len(X)<, 1]

X = X.reshape(-1, 1)

set the parameters for the algorithm (see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer for more info)

imm_binning = KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='quantile')

apply the binning algorithm on the data and print the edges

X_binned = imm_binning.fit_transform(X) print("immunization_count bin_edges: ", imm_binning.binedges[0])

replace data in the dataframe (to have this colum ready for Machine Learning) and rename colums to unambiguous name

pat_exp['immunizations'] = X_binned pat_exp = pat_exp.rename(columns={'immunizations': 'imm_group'})`

@lknegendorf , kannst Du mir für diese Summe Hilfestellung geben, dann könnten wir für ypothese 2 zumindest ne correlation matrix bauen und ggf nochmal nach deinen immunization bins aufclustern

Grazie mille und gute nacht

Moin Felix, ich würde für die Fragestellung folgendes SQL-Statement entwerfen (kannst ja mal schauen ob du sowas meintest oder was noch dazu sollte):

SELECT imm_cost.patient_id, imm_cost.cost_immunization, em_cost.cost_emergency FROM (SELECT patient_id, SUM(base_immunization_cost) AS COST_IMMUNIZATION FROM (SELECT patient_id, id, dataset_origin, immunization_date, immunization_code, Max(base_immunization_cost) AS base_immunization_cost FROM encounters GROUP BY patient_id, id, dataset_origin, immunization_date, immunization_code) GROUP BY patient_id) imm_cost, (SELECT patient_id, Coalesce(SUM(base_encounter_cost), 0)

fphans commented 3 years ago

ich bekomme diesen join nicht zum laufen, könntest duda nochmal drüber gucken?

fphans commented 3 years ago

oder wir nehmen das raus und sagen wie schon im wiki ehrlich bemerkt--> unbearbeitet. der guten form halber mache ich das jetzt zu dann geht die neue woche mit 0 los