aphp / eds-scikit

eds-scikit is a Python library providing tools to process and analyse OMOP data
https://aphp.github.io/eds-scikit
BSD 3-Clause "New" or "Revised" License
35 stars 5 forks source link

fix: remove OMOP `<>_date` columns #59

Closed Thomzoy closed 3 months ago

Thomzoy commented 5 months ago

Description

Checklist

codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 84.25%. Comparing base (b7c19e6) to head (03a9b6b). Report is 5 commits behind head on main.

:exclamation: Current head 03a9b6b differs from pull request most recent head 6a09ff4

Please upload reports for the commit 6a09ff4 to get more accurate results.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #59 +/- ## ========================================== + Coverage 84.23% 84.25% +0.01% ========================================== Files 86 86 Lines 2550 2553 +3 ========================================== + Hits 2148 2151 +3 Misses 402 402 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

svittoz commented 5 months ago

Nice fix! However, it wouldn't resolve the issue for tables without datetime columns, such as the CONCEPT table.

Is there a way to log a message indicating that the date column can be dropped to solve ?

github-actions[bot] commented 3 months ago

Coverage Report

NameStmtsMiss∆ MissCover
TOTAL2281153093%
Files without new missing coverage
NameStmtsMiss∆ MissCover
eds_scikit/utils/test_utils.py

Was already missing at line 50

 def date(s):
-     return dt.strptime(s, "%Y-%m-%d")
Was already missing at lines 88-90
         args = tuple(args)
-     elif type(index_or_key) == str:
-         kwargs[index_or_key] = inputs
Was already missing at lines 114-116
     else:
-         normalized_sum_sq_diff = sum_sq_diff / np.sqrt(sum_sq_diff)
-         assert normalized_sum_sq_diff < 0.001

545091%
eds_scikit/utils/flowchart/flowchart.py

Was already missing at line 152

     def __str__(self) -> str:
-         return self.__repr__()

1311099%
eds_scikit/utils/custom_implem/custom_implem.py

Was already missing at line 54

         """
-         return cut(
             x,

221095%
eds_scikit/utils/checks.py

Was already missing at line 127

         if return_index_or_key:
-             return kwargs[argname], argname
         return kwargs[argname]
Was already missing at line 149
         else:
-             to_display_per_concept = [f"- {concept}" for concept in required_concepts]
         str_to_display = "\n".join(to_display_per_concept)
Was already missing at lines 172-189

-         if all(isinstance(table, tuple) for table in required_tables):
  ...
-         super().__init__(message)

7110086%
eds_scikit/utils/bunch.py

Was already missing at line 32

     def __setattr__(self, key, value):
-         self[key] = value
Was already missing at line 35
     def __dir__(self):
-         return self.keys()
Was already missing at lines 38-41
     def __getattr__(self, key):
-         try:
-             return self[key]
-         except KeyError:
             raise AttributeError(key)

115055%
eds_scikit/resources/utils.py

Was already missing at line 19

     if len(splited) == 1:
-         return None
     return splited[-1]

61083%
eds_scikit/resources/reg.py

Was already missing at lines 50-78

             # Looking for a match excluding version string
-             candidates = [
  ...
-             func = r.get(candidates[0])
         return func

164075%
eds_scikit/period/tagging_functions.py

Was already missing at lines 60-63

         # TODO: is this necessary ?
-         logger.warning("No matching were found between the 2 DataFrames")
- 
-         return framework.DataFrame(
             columns=["person_id", "t_start", "t_end", "concept", "value"]
Was already missing at lines 119-123
         return (B_start >= A_start) & (B_end <= A_end)
-     elif algo == interval_algos.from_before_to:
-         return B_end <= A_start
-     elif algo == interval_algos.to_before_from:
-         return A_end <= B_start
     else:

366083%
eds_scikit/period/stays.py

Was already missing at line 409

         if open_stay_end_datetime is None:
-             open_stay_end_datetime = datetime.now()
         vo["visit_end_datetime_calc"] = open_stay_end_datetime

861099%
eds_scikit/io/i2b2_mapping.py

Was already missing at lines 38-211


-     i2b2_table_name = i2b2_tables[db_source][table]
  ...
-     return df
Was already missing at lines 230-234

-     def f(x):
-         return mapping.get(x, default)
- 
-     return F.udf(f)

7969013%
eds_scikit/io/base.py

Was already missing at line 13

     def __str__(self):
-         return self.__repr__()

91089%
eds_scikit/event/from_code.py

Was already missing at lines 108-111

     else:
-         event.loc[:, "t_start"] = event.loc[:, columns["code_start_datetime"]]
  ...
-         event = event.drop(
             columns=[columns["code_start_datetime"], columns["code_end_datetime"]]

423093%
eds_scikit/event/diabetes.py

Was already missing at lines 88-102

     """
-     diabetes = conditions_from_icd10(
  ...
- 
-     return diabetes

104060%
eds_scikit/event/consultations.py

Was already missing at line 68

     if type(algo) == str:
-         algo = [algo]

611098%
eds_scikit/emergency/emergency_care_site.py

Was already missing at line 54

     if algo == "from_regex_on_parent_UF":
-         return from_regex_on_parent_UF(care_site)
     elif algo == "from_regex_on_care_site_description":
Was already missing at line 166
     """
-     return attributes.get_parent_attributes(
         care_site,

312094%
eds_scikit/datasets/synthetic/biology.py

Was already missing at lines 37-44

     def reset_to_pandas(self):
-         if self.module == "koalas":
  ...
-             self.module = "pandas"

1327095%
eds_scikit/datasets/__init__.py

Was already missing at line 38

 def __dir__():
-     return known_datasets + [func.__name__ for func in __all__]
Was already missing at lines 52-56
 def add_dataset(table: pd.DataFrame, name: str):
-     dataset_path = os.path.abspath(
-         os.path.join(os.path.dirname(__file__), name + ".csv")
-     )
-     table.to_csv(dataset_path, index=False)
Was already missing at line 67
     """
-     return [func.__name__ for func in __all__]

264085%
eds_scikit/biology/viz/plot.py

Was already missing at line 72

     else:
-         logger.error(
             "The folder {} has not been found",
Was already missing at lines 718-720
     else:
-         terminologies_hist = alt.Chart().mark_text()
-         terminologies_time_series = (
             alt.Chart(measurement)

1303098%
eds_scikit/biology/viz/aggregate.py

Was already missing at line 83

     if stats_only:
-         return {"measurement_stats": measurement_stats}
Was already missing at line 208
     if overall_only:
-         return measurement_stats_overall

972098%
eds_scikit/biology/utils/config.py

Was already missing at lines 30-66

     """
-     my_custom_config = pd.DataFrame()
  ...
-     register_configs()
Was already missing at lines 73-75
     for config in glob.glob(os.path.join(CONFIGS_PATH, "*.csv")):
-         config_name = Path(config).stem
-         registry.data.register(
             f"get_biology_config.{config_name}",
Was already missing at lines 89-94
     """
-     registered = list(registry.data.get_all().keys())
-     configs = [
-         r.split(".")[-1] for r in registered if r.startswith("get_biology_config")
-     ]
-     return configs

3522037%
eds_scikit/biology/cleaning/cohort.py

Was already missing at line 28

     if isinstance(studied_pop, DataFrame.__args__):
-         filtered_measures = measurement.merge(
             studied_pop,

91089%

59 files skipped due to complete coverage.

Coverage success: total of 93% is above 93% 🎉