Open mamerisawesome opened 4 years ago
Thanks for this @mamerisawesome ! I like the vision :) One perspective I have is that we can patter nthese to specific applications of the datasets.
E.g. network analysis (which @andrewnyu is working on), simulation modelling, potentially even applying NLP to document data from social media.
For twitter, I imagine a get_tweets
function that returns a corpus of tweets with covid related hashtags / keywords. Then, we can apply more modelling on top of this :)
I'd like to add that if we're going to parallel route, this Python module may be of use to us for running asynchronous Python code. Link is here which is from PEP so Python recommended.
With the intent to make the library as a central hub for various datasets, we might need to find a way to reduce extraction overheads. One thing I have in mind is to make extractions happen in parallel.
Issues and / or Suggestions
e.g. When programmer only need the ff. columns:
["case_no", "lat", "long"]
, the library can give you a hint to install / download datasets that satisfy the columns they need.Note
These may be an issue that we don't need right now as current processes do not hinder analyses of data.