EwoutH / India-drought-ABM

GNU General Public License v3.0
0 stars 0 forks source link

Data initialization and validation tracking issue #2

Open EwoutH opened 1 year ago

EwoutH commented 1 year ago

This is a tracking issue for all the data we might need, for either initialization or validation.

@nit1995 Thanks for taking this on! Can you take the lead on the Drought and Irrigation parts? Please discuss these things also directly with Kaveri (since she is the subject matter expert).

See also the pseudocode: https://docs.google.com/document/d/1Fay3uEJRzJnAenQnd0Pa9e5AOzRPa5O7vOzR0pKnI5M/edit

Farming

Financial

Network

conceptual-mindmap-India-drought-2023-05-22.zip

Kaveri3012 commented 1 year ago

Thanks a lot, Ewout. Working with Niteesh on this.

nit1995 commented 1 year ago

Can multiple types of irrigation be present on the same farm land? If so, which can and which can not? Yes. Different crops have different irrigation needs, and if the farmer is diversifying his crop, it is possible to have multiple types of irrigation on the same farm land.

Does having irrigation take any space that can't be used to produce crops? Unless the farmer builds farm ponds, the space taken by irrigation is negligible and can be ignored.

Does irrigation have a maximum capacity, or can it handle any drought? Irrigation systems depend on a reliable water source, which can include rivers, reservoirs, wells, or aquifers. However, during an intense drought, these sources might not yield sufficient water to satisfy the requirements of irrigation. While borewells generally continue to provide water during drought conditions, their effectiveness can be significantly reduced during extended periods of drought.

https://www.deccanherald.com/state/cdurga-villages-answer-drought-713763.html https://www.livemint.com/Politics/CF05j4ycqjUmMI4qDQ8IzI/Running-out-of-water-droughthit-Karnataka-to-rent-private.html

EwoutH commented 1 year ago

@nit1995 Thanks a lot for doing the research on this!

Yes. Different crops have different irrigation needs, and if the farmer is diversifying his crop, it is possible to have multiple types of irrigation on the same farm land.

Can you make an yes/no table which of these crops can use which of these irrigation types? ['Maize', 'Pigeonpea', 'Sorghum', 'Chickpea', 'Groundnut', 'Finger millet']

Unless the farmer builds farm ponds, the space taken by irrigation is negligible and can be ignored.

Perfect, let’s assume irrigation doesn’t take significant space.

Irrigation systems depend on a reliable water source, which can include rivers, reservoirs, wells, or aquifers. However, during an intense drought, these sources might not yield sufficient water to satisfy the requirements of irrigation. While borewells generally continue to provide water during drought conditions, their effectiveness can be significantly reduced during extended periods of drought.

@Kaveri3012 and @nit1995 can you discuss how this translates to yield? I think we could quantify it as precipitation deficit, of which each irrigation system can handle some amount. Then we need a functions:

nit1995 commented 1 year ago

Can you make an yes/no table which of these crops can use which of these irrigation types? ['Maize', 'Pigeonpea', 'Sorghum', 'Chickpea', 'Groundnut', 'Finger millet']

I've added a lookup table for these crops. However, @Kaveri3012 and I are looking at better defining the geographical scope based on the sample size of the respondents belonging to these districts in the CMIE data. Based on the districts chosen, the major crops might change. I will update it soon.

can you discuss how this translates to yield?

Using ICRISAT data to run a regression is tricky because it doesn't control for the type of irrigation or any other factors that might influence yield.

I have been looking at literature to see if I can find some other estimate. This paper shows strong correlation between rainfall from June to September and yield for different crops. Another paper however show that exposure to extreme temperatures impact yield more than rainfall. And that impact of rainfall is more significant (for rice grown) in rainfed than in irrigated conditions. If I don't find any secondary literature on these crops, I might have to use ICRISAT data itself.

Which irrigation type can handle what amount

This is tough to quantify. Borewells generally provide respite during droughts, but how long they run depends on groundwater levels and also what type of irrigation the farmers use. Irrigation types using water from canals again depend on if the rivers are flowing and enough water has been released into the canal.

Kaveri3012 commented 1 year ago

@nit1995 Could you upload the data on the look up tables (and data on farm-land etc ) by the end of today? We already discussed this two days ago. The more data we have sooner, the better, so @EwoutH can build a better model to start with.

@EwoutH the link between rainfall deficit and drought seems more complicated to quantify. But we'll try and have something by the end of this week.

Kaveri3012 commented 1 year ago

@nit1995 please also list out which of the five districts we will be specifically looking at in Karnataka, and the reasons for choosing those districts, and any related limitations of making that choice.

We will then have to find both demography-related (income, consumption trends) and crop-related data, across the three farmer groups for those five districts, and send them to Ewout

EwoutH commented 1 year ago

Great work, it’s appreciated!

EwoutH commented 1 year ago

@nit1995 Could you try to deliver all data as tidy data? That looks like this:

The 3 rules of Tidy Data

1_jS6ldw3qCLWA4m5aU6kn-Q

  1. Each variable is a column
  2. Each observation is a row
  3. Each type of observational unit is a table

For example, for the Farmland data, that would go from this

Year MARGINAL NUMBER (1000 Number) MARGINAL AREA (1000 ha) SMALL NUMBER (1000 Number) SMALL AREA (1000 ha) SEMI MEDIUM NUMBER (1000 Number) SEMI MEDIUM AREA (1000 ha) MEDIUM NUMBER (1000 Number) MEDIUM AREA (1000 ha) LARGE NUMBER (1000 Number) LARGE AREA (1000 ha) TOTAL NUMBER (1000 Number) TOTAL AREA (1000 ha)
2005 410.36 219.60000000000002 368.65 528.98 250.65 679.01 113.41 654.25 16.25 234.2 1159.3 2316.0
2010 462.99 251.23999999999998 398.2 565.31 250.05 669.3199999999999 104.88 595.01 14.11 198.31 1230.22 2279.19

to this:

Number Area Area per farmer
Marginal 462990.0 251240.0 0.542647
Small 398200.0 565310.0 1.419663
Semi medium 250050.0 669320.0 2.676745
Medium 104880.0 595010.0 5.673246
Large 14110.0 198310.0 14.054571
Total 1230220.0 2279190.0 1.852669

I now did this in this notebook.

If you have more than two axis (for example if you also would like to keep the different years, having 1) the year, 2) farmer size and 3) attribute as axis), please use multi-indexing. In that case, also feel free to save as a Pickle (DataFrame.to_pickle) instead of a CSV.

I hope this is possible, if you have any questions please let me know!

EwoutH commented 1 year ago

@Kaveri3012 I now just assume the Area per farmer value +- 25% for each farmer type. If you would like another approach please let me know.

Edit: Are there any other properties we like to link to farmer type? Like initial wealth or living costs?

nit1995 commented 1 year ago

@Kaveri3012 I now just assume the Area per farmer value +- 25% for each farmer type. If you would like another approach please let me know.

@EwoutH the classification of farmers based on land holdings is as follows: Marginal: < 1 ha Small: 1-2 ha Semi-medium: 2-4 ha Medium: 4-10 ha Large: >10 ha

Source: Agricultural Census 2015-16

EwoutH commented 1 year ago

Thanks! Can you find/calculate an distribution function or histogram (bins) by any chance?

nit1995 commented 1 year ago

Thanks! Can you find/calculate an distribution function or histogram (bins) by any chance?

I am unable to find a distribution function or a histogram with bins. The data available only gives the total number of farmers and the aggregate area under each category of farmers. I plotted a histogram with this data, but the classification of land holdings does not provide equal intervals.

EwoutH commented 1 year ago

I did some experimentation and found out that the farm sizes quite closely resemble a lognormal distribution!

After a bit of experimentation I landed on a lognormal distribution with shape=0.92 and scale=1.25. It results in this distribution:

test

And results in the following metrics for each classification bin:

Number Area Area per farmer
Marginal 497476 287813.175517 0.578547
Small 357956 513225.280386 1.433766
Semi medium 248166 689381.443650 2.777904
Medium 111991 645684.047012 5.765499
Large 14631 211130.155641 14.430330

Which quite well resembles the table above!

Once we have the farm size data per district, we can fit a lognormal distribution for district by estimating the shape and scale parameters.

See the 4_India-ABM-farmland-size-distribution-function.ipynb

EwoutH commented 1 year ago

@nit1995 please let me know if anything is unclear, you want to discuss something or you’re stuck on something!

nit1995 commented 1 year ago

@EwoutH I had the district-wise farm size data in a format similar to this:

Number Area Area per farmer
Marginal 462990.0 251240.0 0.542647
Small 398200.0 565310.0 1.419663
Semi medium 250050.0 669320.0 2.676745
Medium 104880.0 595010.0 5.673246
Large 14110.0 198310.0 14.054571
Total 1230220.0 2279190.0 1.852669

I was not sure how to estimate the distribution though

Kaveri3012 commented 1 year ago

Hi @EwoutH,

A couple of clarifications from Niteesh and me:

  1. We don't have data on sizes of individual farms for K'taka, but only have the aggregates based on the ICRISAT website... Niteesh and I both, therefore, aren't quite sure how we can go from the data table above to the parameters of the long-normal distribution that you have defined as below:

    Define the shape of the distribution# The parameters can be adjusted based on the characteristics of your specific datashape, loc, scale = 0.92, 0, 1.25

  2. As a work around, we are trying to find empirical papers which have density plots for farm area sizes in India or Karnataka (or elsewhere); if we can find good papers, we can potentially use that data/insight to estimate parameters for the log normal distribution. Does that sound okay?

EwoutH commented 1 year ago

Thanks, I can take care of the parameter estimation. It's nothing more than playing around a bit and check if the error values go down.

EwoutH commented 1 year ago

@nit1995 Any chance you can price data on 'Castor', 'Linseed', 'Pearl millet' and 'Wheat'? Because currently we only have pricing data on Chickpea, Finger millet, Groundnut, Maize, Paddy, Pigeonpea and Sorghum.

Otherwise we will have six crops in rotation: Chickpea, Finger millet, Groundnut, Maize, Pigeonpea and Sorghum (no Paddy, because no area data).

If so, please add them to the CSVs without changing the data structure.

If not, also no problem, because those 6 listed above are also the most grown by area in Karnataka.

Kaveri3012 commented 1 year ago

@nit1995 @EwoutH

  1. @nit1995 paddy is a very important crop -- let's try to find area data on it!
  2. can you clarify what percentage of all production these six crops cover @nit1995 ? If it is a substantial share (>50%), I think we can limit ourselves to the six crops... I doubt that we will uncover something grand and new in drought inequality dynamics with additional crops. I am concerned about leaving out paddy though.

best, Kaveri

EwoutH commented 1 year ago

2. can you clarify what percentage of all production these six crops cover @nit1995 ? If it is a substantial share (>50%), I think we can limit ourselves to the six crops...

Without considering paddy, these six cover about 95% of area. Without paddy, no idea.

Edit: I just noticed, we do have area data for rice, but not paddy, and do have price data for paddy, not rice. Probably we can just say rice = paddy, and all our problems are solved.

EwoutH commented 1 year ago

Paddy = rice solved a lot of problems, we now have 7 crops!

@Kaveri3012 I was thinking about how farmers estimate the expected return of switching crops. My initial idea is taking the market price of the past 5 years for their current crop and the crop they want to switch those, and comparing those.

Kaveri3012 commented 1 year ago

Sounds like a good start to me. I'm wondering whether a farmer agent also make price projections into the future?

On Wed, Jun 14, 2023, 17:01 Ewout ter Hoeven @.***> wrote:

Paddy = rice solved a lot of problems, we now have 7 crops!

@Kaveri3012 https://github.com/Kaveri3012 I was thinking about how farmers estimate the expected return of switching crops. My initial idea is taking the market price of the past 5 years for their current crop and the crop they want to switch those, and comparing those.

— Reply to this email directly, view it on GitHub https://github.com/EwoutH/India-drought-ABM/issues/2#issuecomment-1591016164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWFOXYFJSEJD3NZAB5ZE33XLGOJRANCNFSM6AAAAAAYNDJPJY . You are receiving this because you were mentioned.Message ID: @.***>

Kaveri3012 commented 1 year ago

Ewout,

I do think it maybe better to focus on the lending models and setting up the networks this week, until the details of the investment model (and risk aversion) etc become clear

On Wed, Jun 14, 2023, 17:25 Kaveri Iychettira @.***> wrote:

Sounds like a good start to me. I'm wondering whether a farmer agent also make price projections into the future?

On Wed, Jun 14, 2023, 17:01 Ewout ter Hoeven @.***> wrote:

Paddy = rice solved a lot of problems, we now have 7 crops!

@Kaveri3012 https://github.com/Kaveri3012 I was thinking about how farmers estimate the expected return of switching crops. My initial idea is taking the market price of the past 5 years for their current crop and the crop they want to switch those, and comparing those.

— Reply to this email directly, view it on GitHub https://github.com/EwoutH/India-drought-ABM/issues/2#issuecomment-1591016164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWFOXYFJSEJD3NZAB5ZE33XLGOJRANCNFSM6AAAAAAYNDJPJY . You are receiving this because you were mentioned.Message ID: @.***>

EwoutH commented 1 year ago

Changes in code, since Tuesday:

New assumptions:

JLG:

Neigbours:

Lending:

Yield function:

Crop diversification:

Expenditure:

Normalize for inflation:

Farmers

EwoutH commented 1 year ago

@nit1995 For initialisation (and validation), I need to know how much different crops (i.e. 30% has 1, 50% has 2, 20% has 3) a farmer on average has. This can be a lookup or probability function. It might depend on the farm size/class. Do you think you can find some data on that?

nit1995 commented 1 year ago

@EwoutH I can't seem to get any data on how many farmers do multiple cropping or on how many crops they sow. I just found this statistic , but this is not India specific.

Only 5% of global rainfed cropland is under multiple cropping, whereas 40% of global irrigated cropland is under multiple cropping

EwoutH commented 1 year ago

Thanks for looking anyway.

I also need a formula for the maximum amount that (a member of) a joint liability group can finance.

It would also be nice to have an indication of the typical duration of loans.

@Kaveri3012 in the pseudocode I encountered both notes of that a JGL can loan from a bank and microfinance institutions. Is it both, or just one of the two?

Edit: Also need to know how to translate income to an amount to lend at a nationalised bank. Last years income, or do you need to show a trend or something? 5 year average or minimum? Simple regression?

Kaveri3012 commented 1 year ago

Hi Ewout,

  1. let's assume that JGL can only loan from a microfinance institution and not from a bank. (this is an accurate assumption, sorry about the confusion in the notes)
  2. for the income, let us assume last three years' average income

Best, Kaveri