Closed stijnvanhoey closed 4 years ago
Age pyramid of Belgium.csv
is hardcoded in the file model.py (FullPop, StudentPop, ElderPop with as limits 20 and 70 years). @twallema might need these data still for the age-layered model possibly?Sectoral_data.xlsx
we received via mail from Gert: he "downloaded sectoral data for i) value added and ii) employment that could be used for your calibrations. The sources are the "national accounts" and "employment" statistics of the National Bank of Belgium (NBB). These are annual values for 2018 (2019 is not yet available). In the spreadsheet, I have also included the value added for 2015 (in case you want to compare with input-output below)".input-output
data idem, but this can also be directly downloaded from source: https://www.plan.be/databases/io2015/vr64_en_20181217.xlsx (only constructed each 5 years, so these are for 2015)staff distribution
is data collected by the GEES (also received via mail) and represents the situation mid-april. (The data shows for each sector the staff distribution working from home (telework), at workplace, being unemployed, etc. The sectors cover about 70% of total private employment)These data descriptions should probably come in a readme with a section on data description, I suppose?
These data descriptions should probably come in a readme with a section on data description, I suppose?
Or a dedicated README in the /data directory
As @jorisvandenbossche mentions, https://github.com/stijnvanhoey/COVID19-Model/tree/cookiecutter/data is currently prepared in the PR. I should have mentioned that.
Putting these descriptions inside the general readme can cause a quick overload of info in the general readme, so rather pt it close to the data.
@stijnvanhoey 1) The DataExtraction notebook is not used for analysis, it is a demo of Sciensano data extraction, so it's a form of 'documentation' (?). 2) Grouping of economic files is correct. 3) Contact data originates from the file 'contacts.Rdata' , which was made public in the following publication: https://www.thelancet.com/journals/lanpub/article/PIIS2468-2667(20)30073-6/fulltext . With regard to our age-layered deterministic model, this is a replication of said paper in the Lancet.
4) I think the Erlang parameters were not given in Li. et al so I decided to convert the figure of the distribution into a csv using an online tool. Next, the following code is used to sample from the distribution.
def sampleFromDistribution(self,filename,k):
df = pd.read_csv(filename)
x = df.iloc[:,0]
y = df.iloc[:,1]
return(numpy.asarray(choices(x, y, k = k)))
First, this is not very elegant. Second, the use of this distribution will most likely be changed or omitted in future work. Ideally, there would be a non-hardcoded option to sample selected parameters from a gamma/erlang distribution.
5) The age pyramid of Belgium is used by the economic model by Cyril but may be omitted. Imperial college age distributed parameters are (not yet) used but should be retained.
@twallema I think most of this issue has been solved? Or any elements that still need some work? If not, can be closed
@JennaVergeynst and @twallema
While preparing the documentation on the new layout of the repository, I'm trying to make sure the data folder gets more structured, see https://github.com/stijnvanhoey/COVID19-Model/tree/cookiecutter#using-data
For the moment I just moved the data into the
raw
folder, except of theincubation.csv
. However, I'm not sure if this is correct and I got some additional questions:economical
directory a good division?Interaction_matrices
data is coming from https://lwillem.shinyapps.io/socrates_rshiny/ according to the notebook. Have they been downloaded manually? Are these the raw formats or as there been any transformation done already? Could we maybe download them by using code and write small snippet for it?incubation.csv
data set appartenly is coming from incubation period is assumed to be Erlang distributed as reported by Li et al. (2020a). Is there a small code snippet of the creation somewhere? Is it actually required to have it as a data file or can we do the extraction also by creating a function that samples the distribution, eg using https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.erlang.html?Age pyramid of Belgium.csv
,contacts.Rdata
andimperialCollegeAgeDist.csv
actually used, or can these be removed?