Closed TheooJ closed 1 year ago
Patch and project coverage have no change.
Comparison is base (
001fe9b
) 83.84% compared to head (62f463c
) 83.84%.
:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Description
When using
HiveData
with theperson_ids
argument, calling a table (i.e.data.visit_occurrence
) results in an error:AttributeError: 'set' object has no attribute 'columns'
.This is because we're attempting to merge
df
(the table) onself.person_ids
--which is a set-- and notself.person_ids_df
--which is the DataFrame of theperson_ids
to keep--.Description
https://github.com/aphp/eds-scikit/blob/001fe9bd139fdee10ffc78129bdafcfd9fcfbad8/eds_scikit/io/hive.py#L228 Replace
self.person_ids
byself.person_ids_df
https://github.com/aphp/eds-scikit/blob/001fe9bd139fdee10ffc78129bdafcfd9fcfbad8/eds_scikit/io/hive.py#L226 Should be after
df = df.join
, otherwise we're joining a Spark with a Koalas dataframe.How to reproduce the bug before fix
Error log
Checklist