alphacastio / connectors-main

Guidance and Project management for Open-Source Alphacast Connectors
MIT License
8 stars 5 forks source link

[DATA-RQ] IPCVA #20

Open miliricchini opened 2 years ago

miliricchini commented 2 years ago

What is the estimated difficulty of this data request? Intermediate

What is the Source / Url of this Data http://www.ipcva.com.ar/vertext.php?id=964 image The datasets can be downloaded by going to each link, select all available data, click on

image

then go all the way down and click on

image

What is the format of the data source Excel

What is the name of the new datasets or datasets? 8 datasets should be created from this url: image

  1. Activity - Argentina - IPCVA - Producción - Producción en Tn. Res c/hueso
  2. Activity - Argentina - IPCVA - Producción - Serie de Indicadores
  3. Activity - Argentina - IPCVA - Producción - Faena Clasificada
  4. Activity - Argentina - IPCVA - Producción - Pesos Promedio
  5. Activity - Argentina - IPCVA - Mercado Doméstico - Consumo Promedio
  6. Activity - Argentina - IPCVA - Mercado Doméstico - Precios al Consumidor
  7. Activity - Argentina - IPCVA - Precios Internacionales - Precios en Pie
  8. Activity - Argentina - IPCVA - Precios Internacionales - Precios al Gancho

In which repository will it be? Argentina Agroindustry

Which are the Entities of the new dataset If the total number of variables is < 20 then entities are: [Date, Country]

If the total number of variables is >= 20 then the entities are: [Date, Country, variableName]

What data should be extracted? All the available data from each of the links should be downloaded.

The date format MUST be YYYY-MM-DD, if the data has monthly periodicity the day should be the first of the month, if the data has yearly periodicity the date should be the first day of the year.

WARNING: Sometimes the Excel downloaded needs to have its date rearranged in chronologic order.

The variable names correspond to the ones in the second row of the excel. The first row must be ignored. Variation and participation columns must be ignored too.

Example:

image

image

If the variable has different levels (other than the country, which has to be in a separate column and NOT in a row), the variable name should be nested using " - ".

Example: image

Variable names should be: Novillos - Expor +480 Novillos - Consumo 420-440 and so on

When the number of variables is bigger than 20, the variable name is a separate entity but it must be built in the same way as indicated above: nested using " - ".

ATTENTION In dataset n° 5 (Activity - Argentina - IPCVA - Mercado Doméstico - Consumo Promedio) years are in rows and months in columns.

The table should be pivoted, so that years and months are in rows and a unique date column with format YYYY-MM-DD can be created, while the variable name ("Consumo de Carne Vacuna - Kilogramos/Habitante") and its values are on the next column. Also, a Country column must be added to the left indicating the country (Argentina in this case).

WARNING: The "Promedio" column must be ignored.

eze2286 commented 2 years ago

Buen día, quería ver si me pueden asignar este ISSUE. Una consulta, como deben tratarse los datos S/D? se reemplazan por 0? o se los deja así? en caso de quedar así subirían en formato string a la plataforma, y luego cuando se descarga el excel pueden ser pasados a numero. Muestro ejemplo en la foto:

celdas con S-D

Aguardo la respuesta. Gracias.