Open eduardocorrearaujo opened 1 year ago
Also, I have some concerns about how this data should be sent, since json is a really flexible format.
By now, what I did was: send the predict element below:
In the case above I had a dataset with the prediction for all the capitals of the northeast states and sent it in the same JSON (using the function in the docs). I should have sent, for each city, in a different request?
If for every region that we predict (adm 0, adm 1 or adm2) we should send in a different request, maybe it would be interesting to add, **when posting the prediction in the database***, the adm value associated with the prediction ( For example, if the model of the user forecast for all BR (aggregated), he should fill: adm_0 = "BR', adm_1 = Nan, adm_2 = Nan. If the prediction is for a specific state in Brazil (Paraná, for example): adm_0 = "BR', adm_1 = "PR", adm_2 = Nan. If the prediction is for a specific city in Brazil, Fortaleza, for example:adm_0 = "BR', adm_1 = "CE", adm_2 = 2304400. In this case we could remove it from the JSON of the predictions.)
One advantage of using it is that when seeing all the predictions as below, we could have a filter to see all the forecasts related to a specific city or state, I think it would make the comparison of models easier.
* The parameters associated with the sending the predictions are shown here: https://api.mosqlimate.org/docs/registry/GET/predictions/#parameters_table
I Agree with this idea @eduardocorrearaujo , but I am not sure we can currently search by ADM level. Can we @luabida ?
Such a template needs to be clearly described in the documentation together with code snippets for Python and R so that people will follow the recommendations.
Later we can create Python and R client libraries that can analyze the JSON structure of the predictions and remind the user to adhere to the recommended template.
I Agree with this idea @eduardocorrearaujo , but I am not sure we can currently search by ADM level. Can we @luabida ?
Such a template needs to be clearly described in the documentation together with code snippets for Python and R so that people will follow the recommendations.
Later we can create Python and R client libraries that can analyze the JSON structure of the predictions and remind the user to adhere to the recommended template.
This option to search by adm is not implemented. It would be necessary to add the parameter adm
in the prediction object. In this case, I can see two possibilities:
We just include one adm parameter that can be filled with: country (as BR), state (as PR), or ibge code (as 4108304). In this case, we recommend the user indicate in this parameter only the lowest geographical unit of prediction.
We include one parameter for each adm, and the user only fills it with the geographical unit related to the prediction. For example, if he generates a forecast for only Foz do Iguaçu he just fills: adm_2 = 4108304 and keeps the other adm values nan
I think we should just add a field in the prediction table called ADM_level to indicate what the geographical divisions the prediction refer to.
As to how the polygons should be identified Within the JSON, we can address this in the documentation with examples that work with our visualization library.
@luabida can you move forward with this solution?
After issue #116, I propose that every prediction of a forecast
model must have at least the following columns:
date
: with the date of the values forecasted;
preds
: with the predictions;
lower
: with the lower value of the CI (Nan if the model didn't provide a CI);
upper
: with the upper value of the CI (Nan if the model didn't provide a CI);
geocode
: If ADM_LEVEL (field filled in the model registry) is equal to 2, this column contains
the IBGE code (7 - digit) of the city forecasted; If ADM_LEVEL is equals 1,
this column contains the UF code (two-letter) or two digits of the state forecasted;
If ADM_LEVEL is equal to 0,
this column contains the ISO code of the country forecasted (BR for Brazil);
Also, if, df
is the dataframe with the columns above, it can be transformed into the JSON format using the code below:
df_in_json_format = df.to_json(orient = 'records', date_format = 'iso')
Furthermore, this JSON can be transformed back into dataframe using the snippet below:
import json
json_struct = json.loads(df_in_json_format)
df_flat = pd.io.json.json_normalize(json_struct)
df_flat.date = pd.to_datetime(df_flat.date)
df_flat.head()
With these changes, we can close this issue, what do you think, @fccoelho?
I think that is a good template, I would only make a requirement for the geocode
to always be numeric, except for ADM_0. In GADM they have a ISO_1
variable, that for brazil, looks like this:
BR-AC
for acreBR-AM
for Amazonas, etc.For municipalities GADM has the 7-digit geocode in a variable called CC_2
.
I think that is a good template, I would only make a requirement for the
geocode
to always be numeric, except for ADM_0. In GADM they have aISO_1
variable, that for brazil, looks like this:
BR-AC
for acreBR-AM
for Amazonas, etc.For municipalities GADM has the 7-digit geocode in a variable called
CC_2
.
There is a number equivalent to BR-AC
in the GADM?
I think that is a good template, I would only make a requirement for the
geocode
to always be numeric, except for ADM_0. In GADM they have aISO_1
variable, that for brazil, looks like this:
BR-AC
for acreBR-AM
for Amazonas, etc.For municipalities GADM has the 7-digit geocode in a variable called
CC_2
.There is a number equivalent to
BR-AC
in the GADM?
There is a Field called CC_1
but it is filled with NA
@fccoelho I talked with Leo, and he said that his model generates predictions for macroregions. Should we add this option to the adm level as a new option?
Also, Leo said his model can generate predictions by macroregion, UF, and BR and by week or year. In this case should we move adm_level and periodicity to the prediction registry instead of the model registry, or is it a specific case?
Also, Leo said his model can generate predictions by macroregion, UF, and BR and by week or year. In this case should we move adm_level and periodicity to the prediction registry instead of the model registry, or is it a specific case?
No, in this case, it is best that the Author registers separate instances of the "same" model for each target configuration.
Also, Leo said his model can generate predictions by macroregion, UF, and BR and by week or year. In this case should we move adm_level and periodicity to the prediction registry instead of the model registry, or is it a specific case?
No, in this case, it is best that the Author registers separate instances of the "same" model for each target configuration.
Great! And about the macro-region option in the ADM_level?
There is no equivalence to Macro-region in GADM.org , So we need to think a little more about how to support it. Maybe if the author wants to support other geographical scales other than ADM 0, 1,2 and 3, it should leave it outside of the platform.
I think that is a good template, I would only make a requirement for the
geocode
to always be numeric, except for ADM_0. In GADM they have aISO_1
variable, that for brazil, looks like this:
BR-AC
for acreBR-AM
for Amazonas, etc.For municipalities GADM has the 7-digit geocode in a variable called
CC_2
.There is a number equivalent to
BR-AC
in the GADM?
no
Despite knowing that we should deal with different classes of models since our first product is the comparison of forecast models, I think that we should create a template for the forecasts sent. My suggestion is, that every dataset should at least have the the columns:
dates
: with the date of the values forecasted;preds
: with the predictions;lower
: with the lower value of the CI (Nan if the model didn't provide a CI);upper
: with the upper value of the CI (Nan if the model didn't provide a CI);adm_2
: in the Brazil case the IBGE code (7 - digit) of the city forecasted;adm_1
: in the Brazil case the UF code (two-letter) of the state forecasted;adm_0
: ISO code of the country forecasted (BR for Brazil);About the adm columns we would consider that the prediction refers to the biggest (between 0 and 2) column with the value filled. For example, if the model of the user forecast for all BR (aggregated), he should fill: adm_0 = "BR', adm_1 = Nan, adm_2 = Nan. If the prediction is for a specific state in Brazil (Paraná, for example): adm_0 = "BR', adm_1 = "PR", adm_2 = Nan. If the prediction is for a specific city in Brazil, Fortaleza, for example:adm_0 = "BR', adm_1 = "CE", adm_2 = 2304400.
What do you think @fccoelho?