COVID-19-electronic-health-system / Corona-tracker

An easy-to-use PWA to monitor the user's wellness and learn about COVID-19.
https://coronatracker.me/
MIT License
236 stars 101 forks source link

[FEAT] Python scripting: format GSheet csv to json via Google API data pull #665

Open ngiangre opened 4 years ago

ngiangre commented 4 years ago

⚠️ IMPORTANT: Please fill out this template to give us as much information as possible to consider/implement the feature.

Prerequisites

Summary

This is a modular issue sprouting from #643

All the translations are in an accurate format and now we need to pull the sheets (English, Spanish, French, Italian, Dutch (Netherlands), Russian) via the google API into the respository for front end developers to reference by the json keys.

There is a link in the python script src/python/pull_gsheet_data.py for creating your own google api key. This python script is just a start and needs more development.

each translation sheet has parentKey, childKey, fieldKey, value, translatedValue, and then other columns. We need a structure of { 'parentKey' : { 'childKey' : { 'fieldKey' : { 'value' : '', 'translatedValue' : '', ... } } } }. In the case of education, { 'parentKey' : { 'childKey' : [ {'value': '', ... }, {'value' : '', ... }, ... ] } }

The Date attributes are not needed - filter those out.

The resulting json files should go into public/locales/ though I put them in docs/content to not mess things up.

Here's some extra code I already started using that might be helpful: `

Init sheet names and output dir

data_model_sheet_name = "Data Model" education_sheet_name="Education" health_sheet_name = "Health" translation_sheets_regex = " - Master Sheet" translation_sheets_not_regex = "OLD" languages_to_pull = ['English','Dutch (Netherlands)','Spanish','Italian','French','Russian'] out_dir = "../../docs/content/"

Set dictionary to connect languages to two-letter abbreviations

https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

language_letters_dict= { 'English' : 'en', 'Dutch' : 'nl', 'Spanish' : 'es', 'Italian' : 'it', 'French' : 'fr', 'Russian' : 'ru'} `

Motivation

We need to have translations.json files in locales to support other languages

Possible Alternatives

only english and then hard coding.

Additional Context

Please comment here for more detail or to work through fixing the issue. You can ask @ngiangre for assistance on python scripting.

pavel-ilin commented 4 years ago

I'm happy to remind myself how to work with python!

ngiangre commented 4 years ago

go ahead @pavel-ilin!

ngiangre commented 4 years ago

This still needs work!!

SomeMoosery commented 4 years ago

My bad - didn't think about how the "fixes" keyword would close this!

ngiangre commented 4 years ago

no worries haha I’m barely keeping up. ISSUE IS BACK OPEN! We need to create nested jsons from google api pulled csv files!

ngiangre commented 4 years ago

Some progress on this. Here's a preview

Screen Shot 2020-05-01 at 12 19 43 AM

and an example where an array of values would be favorable:

Screen Shot 2020-05-01 at 12 30 34 AM

I posted a translation.json in the #engineering channel on discord if y'all want to see the full json.

Let me know if this looks good and would be workable! @AdhamAH @pavel-ilin @SomeMoosery

ngiangre commented 4 years ago

There has been tremendous work done on this by @KristianR on discord - thank you!!

We have one more step - converting key strings with >20 characters into shorter strings using common nlp filters, stemming, removing stop words, etc.

The goal is to make a representative and short key string for the education facts and quizzes. This would be a medium priority issue that would be an easy integration into the current algorithm that @NickG and @KristianR have on discord.

Can someone one one work on this?