backtrackbaba / covid-api

API's to interact with COVID19 dataset by John Hopkins University
https://covidapi.info/
MIT License
77 stars 19 forks source link

Current database design could be improved (i.e for adding things like pagination) #8

Open alessandrojcm opened 4 years ago

alessandrojcm commented 4 years ago

Hi there!

Good job on this project, a data source conforming to ISO 3166 standard is just what I needed for an app a friend and I are making.

The only thing missing was a /countries endpoint, which would list all the countries with their respecting the latest data. I went to add that myself, but I found that the current data model design makes this difficult.

So I propose changing it for something like this (the relationship is one-to-many): Untitled

That way it could be trivial to add endpoints like the one I'm trying to add (i.e we would just need to join and fetch the latest record for each). I also think that the current endpoints would be simplified (for example, the global count would be a simple sum on the database side).

I don't think it would be too difficult to implement this; I can do it myself but I wanted to consult first since the current logic will be affected. What do you think?

backtrackbaba commented 4 years ago

Hi @alessandrojcm , thank you for the interest in the project!

The only thing missing was a /countries endpoint, which would list all the countries with their respecting the latest data. I went to add that myself, but I found that the current data model design makes this difficult.

This endpoint was implemented just yesterday and can be found at /api/v1/global/latest. This has not yet been documented. I'll add it to the Postman collection right away.

There were a few issues with how Johns Hopkins maintains the records that's why I had kept all the records in a single table and worked with it. Give me some time, I'll have the docs updated with the issues and how the flow works and the reasons behind them.

I'm currently in the process of establishing the base for v2 of the API's which can be found in the v2 branch.

Let me quickly get the design calls that I had taken and then we'll take this conversation further.

Also, let me know if anything needs to be tweaked in this endpoint - /api/v1/global/latest

alessandrojcm commented 4 years ago

This endpoint was implemented just yesterday and can be found at /api/v1/global/latest. This has not yet been documented. I'll add it to the Postman collection right away. Also, let me know if anything needs to be tweaked in this endpoint - /api/v1/global/latest

Silly me, I was reading through the codebase and I missed that; I'll check it out.

Let me quickly get the design calls that I had taken and then we'll take this conversation further.

Cool! I'm looking forward to it.

alessandrojcm commented 4 years ago

Ah sorry @backtrackbaba, taking a look at the endpoint I realized I misspelled my original post. I meant a list of all the countries with their respecting global count. On top of that, it also could have pagination, filters, ordering and such. Also, I realized the India ISO code is hardcoded?

backtrackbaba commented 4 years ago

Hey, sorry @alessandrojcm I couldn't get back to you earlier with the design calls that I had taken.

As for the ISO code, I was having issues in getting the latest date available in the record, hardcoding with one on the country's ISO code was a workaround to get that date by getting all the records of that country and ordering by date and getting the first record.

A very bad hack, but it did the trick for me at that time.

For the list of countries, I have an endpoint locally /api/v1/country/list which gives me only the list of all the country ISO code and the names.

Yup, sorting, pagination and such would be helpful. However, I can't pagination with the current caching mechanism that we have. Currently, the request comes to the flask route and since we have heavily cached all the routes, it is directly served from the Redis cache by generating a key from the parameters that we pass on to the route. That won't be possible with the case of any query params so that is something that would need to be taken care of.

I meant a list of all the countries with their respecting global count

I didn't get this point in the previous comment. Isn't it the same as what we have at /api/v1/global/count

backtrackbaba commented 4 years ago

And the DB was kept very simple in a single table was due to the dynamic nature of the data sent over by JHU. Their naming conventions aren't standard that's why instead of using any library like pycountry, I had to go with a JSON file with the mapping of JHU's country names and their respective ISO codes.

To "update" the DB, I have an endpoint at /protected/update-db which essentially does things in this sequence:

Download the latest dataset -> Load the country name to ISO mapping JSON -> Drop the DB -> Loop through the dataset and enter all the records -> then clear Redis

This helps me in having the system running with no downtime as while the DB is being "updated", the application simply serves data from Redis.

This was another call that I had taken. Let me know if you have a better solution to this, would be glad to discuss it with you!

alessandrojcm commented 4 years ago

However, I can't pagination with the current caching mechanism that we have.

Yes, that happened to me locally. There must be a way around it t, maybe the queries can be cached? Like when you memoize a function.

I didn't get this point in the previous comment. Isn't it the same as what we have at /api/v1/global/count

No no, that's the global count. I mean an endpoint like that but per country. Something like this: https://corona.lmao.ninja/countries

This was another call that I had taken. Let me know if you have a better solution to this, would be glad to discuss it with you!

Actually I think I do, that's why I think would be helpful to split the countries of the records. The list of countries will not change, but the data will. So, every time you fetch the data you can:

1) Parse the data. 2) Make the mapping of the country name -> ISO code 3) Cut the data: fetch every country by ISO code from the countries table (since they are unique), fetch the latest date for everyone and then dispose of all the data older than that (since those records are already in the database). 4) Update the records table: i.e insert the new records by ISO code.

What do you think?

backtrackbaba commented 4 years ago

I'm sorry I meant this parameter for the latest county of all the individual countries /api/v1/global/latest

Maybe you could try out the approach locally. From what I've seen is the countries changes anytime,

Example: The West Bank and Gaza is a country as well as West Bank and Gaza Kosovo which doesn't have an ISO code of its own, was indicated as a different country for a few days, then one fine day they added up the tally of Kosovo with Serbia.

This was one major trend that I saw with their data as even the past data changes in almost all updates as I can see the number of countries increasing and decreasing randomly with updates. Hence, I stopped directly updating on prod and I replicate the same locally before updating it in prod and also drop the historic data and add it again.

Check out locally with say a week of data which you can find within Pomber's data folder if your plan works.

I'll add an example .env file which would help you in setting up the code locally.

alessandrojcm commented 4 years ago

Still, you'll only have to run the mappings with the JSON and then check if that country already exists; if so you update it; if not you add it. I'll test it out locally as you said, I already got it working with a docker-compose file :wink:

backtrackbaba commented 4 years ago

Awesome, I've just updated the README with the updated local setup steps. Let me know if you have any queries.

I might not be around for long as it's night here. Please feel free to drop your message here, I'll get back to it ASAP.

alessandrojcm commented 4 years ago

Cool, thanks!