fphans commented 3 years ago

`pat_exp = pd.read_sql_query(""" SELECT BASE_ENCOUNTER_COST, BASE_IMMUNIZATION_COST, BASE_PROCEDURE_COST, BASE_MEDICATION_COST, count_pat_immunizations.IMMUNIZATIONS, (CASE WHEN encounterclass LIKE '%ambulatory%' THEN 0 WHEN encounterclass LIKE '%wellness%' THEN 0 WHEN encounterclass LIKE '%emergency%' THEN 1 WHEN encounterclass LIKE '%inpatient%' THEN 0 WHEN encounterclass LIKE '%outpatient%' THEN 0 WHEN encounterclass LIKE '%urgentcare%' THEN 1
ELSE 0 END) AS ML_encounter_code, SUM ( CASE WHEAN BASE_ENCOUNTER_COST, 0) + COALESCE(BASE_PROCEDURE_COST, 0) + COALESCE(BASE_MEDICATION_COST, 0) ) , 0) AS total_cost, patients.Id FROM patients JOIN (SELECT patient_id, encounterclass, BASE_ENCOUNTER_COST, BASE_IMMUNIZATION_COST, BASE_PROCEDURE_COST, BASE_MEDICATION_COST, COUNT(DISTINCT id || dataset_origin || immunization_date || immunization_code) AS immunizations FROM encounters GROUP BY patient_id) AS count_pat_immunizations ON patients.id = count_pat_immunizations.patient_id """, conn_new) pat_exp.info()

bin immunization count to groups instead of dividing data into 'vaccinated' and 'not-vaccinated'

from sklearn.preprocessing import KBinsDiscretizer

extract values of 'immunizations' column

X = pat_exp['immunizations'].values

make sure that X is an array of shape [len(X)<, 1]

X = X.reshape(-1, 1)

set the parameters for the algorithm (see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer for more info)

imm_binning = KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='quantile')

apply the binning algorithm on the data and print the edges

X_binned = imm_binning.fit_transform(X) print("immunization_count bin_edges: ", imm_binning.binedges[0])

replace data in the dataframe (to have this colum ready for Machine Learning) and rename colums to unambiguous name

pat_exp['immunizations'] = X_binned pat_exp = pat_exp.rename(columns={'immunizations': 'imm_group'})`

@lknegendorf , kannst Du mir für diese Summe Hilfestellung geben, dann könnten wir für ypothese 2 zumindest ne correlation matrix bauen und ggf nochmal nach deinen immunization bins aufclustern

Grazie mille und gute nacht

lknegendorf commented 3 years ago

`pat_exp = pd.read_sql_query(""" SELECT BASE_ENCOUNTER_COST, BASE_IMMUNIZATION_COST, BASE_PROCEDURE_COST, BASE_MEDICATION_COST, count_pat_immunizations.IMMUNIZATIONS, (CASE WHEN encounterclass LIKE '%ambulatory%' THEN 0 WHEN encounterclass LIKE '%wellness%' THEN 0 WHEN encounterclass LIKE '%emergency%' THEN 1 WHEN encounterclass LIKE '%inpatient%' THEN 0 WHEN encounterclass LIKE '%outpatient%' THEN 0 WHEN encounterclass LIKE '%urgentcare%' THEN 1 ELSE 0 END) AS ML_encounter_code, SUM ( CASE WHEAN BASE_ENCOUNTER_COST, 0) + COALESCE(BASE_PROCEDURE_COST, 0) + COALESCE(BASE_MEDICATION_COST, 0) ) , 0) AS total_cost, patients.Id FROM patients JOIN (SELECT patient_id, encounterclass, BASE_ENCOUNTER_COST, BASE_IMMUNIZATION_COST, BASE_PROCEDURE_COST, BASE_MEDICATION_COST, COUNT(DISTINCT id || dataset_origin || immunization_date || immunization_code) AS immunizations FROM encounters GROUP BY patient_id) AS count_pat_immunizations ON patients.id = count_pat_immunizations.patient_id """, conn_new) pat_exp.info()

bin immunization count to groups instead of dividing data into 'vaccinated' and 'not-vaccinated'

from sklearn.preprocessing import KBinsDiscretizer

extract values of 'immunizations' column

X = pat_exp['immunizations'].values

make sure that X is an array of shape [len(X)<, 1]

X = X.reshape(-1, 1)

set the parameters for the algorithm (see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer for more info)

imm_binning = KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='quantile')

apply the binning algorithm on the data and print the edges

X_binned = imm_binning.fit_transform(X) print("immunization_count bin_edges: ", imm_binning.binedges[0])

replace data in the dataframe (to have this colum ready for Machine Learning) and rename colums to unambiguous name

pat_exp['immunizations'] = X_binned pat_exp = pat_exp.rename(columns={'immunizations': 'imm_group'})`

@lknegendorf , kannst Du mir für diese Summe Hilfestellung geben, dann könnten wir für ypothese 2 zumindest ne correlation matrix bauen und ggf nochmal nach deinen immunization bins aufclustern

Grazie mille und gute nacht

Moin Felix, ich würde für die Fragestellung folgendes SQL-Statement entwerfen (kannst ja mal schauen ob du sowas meintest oder was noch dazu sollte):

SELECT imm_cost.patient_id, imm_cost.cost_immunization, em_cost.cost_emergency FROM (SELECT patient_id, SUM(base_immunization_cost) AS COST_IMMUNIZATION FROM (SELECT patient_id, id, dataset_origin, immunization_date, immunization_code, Max(base_immunization_cost) AS base_immunization_cost FROM encounters GROUP BY patient_id, id, dataset_origin, immunization_date, immunization_code) GROUP BY patient_id) imm_cost, (SELECT patient_id, Coalesce(SUM(base_encounter_cost), 0)

Coalesce(SUM(base_procedure_cost), 0)
Coalesce(SUM(medication_totalcost), 0) AS COST_EMERGENCY FROM (SELECT patient_id, id, dataset_origin, procedure_date, procedure_code, medication_start, medication_stop, medication_code, Max(base_encounter_cost) AS base_encounter_cost, Max(base_procedure_cost) AS base_procedure_cost, Max(medication_totalcost) AS medication_totalcost FROM encounters WHERE encounterclass IN ( 'emergency', 'urgentcare' ) GROUP BY patient_id, id, dataset_origin, procedure_date, procedure_code, medication_start, medication_stop, medication_code) GROUP BY patient_id) em_cost WHERE imm_cost.patient_id = em_cost.patient_id

fphans commented 3 years ago

ich bekomme diesen join nicht zum laufen, könntest duda nochmal drüber gucken?

fphans commented 3 years ago

oder wir nehmen das raus und sagen wie schon im wiki ehrlich bemerkt--> unbearbeitet. der guten form halber mache ich das jetzt zu dann geht die neue woche mit 0 los

Fuenfgeld / ATeamDatenmanagementUndArchivierung

SQL Sum coloums skip NULL #60

bin immunization count to groups instead of dividing data into 'vaccinated' and 'not-vaccinated'

extract values of 'immunizations' column

make sure that X is an array of shape [len(X)<, 1]

set the parameters for the algorithm (see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer for more info)

apply the binning algorithm on the data and print the edges

replace data in the dataframe (to have this colum ready for Machine Learning) and rename colums to unambiguous name

bin immunization count to groups instead of dividing data into 'vaccinated' and 'not-vaccinated'

extract values of 'immunizations' column

make sure that X is an array of shape [len(X)<, 1]

set the parameters for the algorithm (see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer for more info)

apply the binning algorithm on the data and print the edges

replace data in the dataframe (to have this colum ready for Machine Learning) and rename colums to unambiguous name