Obtaining all tables from Athena is not a good idea

WadhwaniAI / covid-modelling

Repo for modelling the spread of novel Covid-19

MIT License

5 stars 0 forks source link

Obtaining all tables from Athena is not a good idea #35

Closed jerome-ai closed 3 years ago

jerome-ai commented 4 years ago

https://github.com/WadhwaniAI/covid-modelling/blob/59b44e87e28808ee9936fab3cb56798e9fa51969/data/dataloader/athena.py#L77

The AWS pipeline produces lots of tables that are superfluous; to the extent that some are just not even consumable. I'm not sure why this is the case, but for now it's the reality.

The more stable way of interacting with the database is thus to maintain a list of valid tables names (perhaps in code, perhaps in a config file) based on the documentation.

sansiddh commented 4 years ago

https://github.com/WadhwaniAI/covid-modelling/blob/master/data/dataloader/athena.py#L78 In the lastest commit the tables are hardcoded as a list. Is there a better way in which we can do this?

sansiddh commented 4 years ago

Can I close this issue?

jerome-ai commented 4 years ago

In the lastest commit the tables are hardcoded as a list. Is there a better way in which we can do this? Perhaps make them class attributes instead of inlined in the method:
class AthenaLoader(BaseLoader):
_tables = (
'covid_case_summary',
'new_covid_case_summary',
'testing_summary',
)
...
Can I close this issue?

That's up to you (and maybe Nayana) as to whether you're "happy" with the fix; where happy at the very least means things are working, and at the other end means the solution is ideal (which is a matter of taste). If things are working, I'd say you can close it.