The-Data-Alchemists-Manipal / MindWave

MindWave is an open-source project designed for beginners to learn about data science, machine learning, deep learning, and reinforcement learning algorithms using Python. The project offers a platform for implementing relevant algorithms, with open-source tools and libraries.
MIT License
97 stars 144 forks source link

[GSSOC '23]: Species Distribution Modeling Using ScikitLearn #496

Closed sujanrupu closed 1 year ago

sujanrupu commented 1 year ago

What is Species Distribution Modelling?

Species Distribution Modeling (SDM) is a new GIS-based method that combines observations of species with environmental predictions to better map habitats in a specific region with respect to various environmental variables. The results include a habitat map (low to high) to fit throughout the landscape in that environment.

To build a classification(distribution) model, we must consider three aspects of the model: data types, data environments, and algorithms.

Approaches:

Correlative Species Distribution Modeling Correlative SDMs are also known as climate bioclimatic models, envelope models, or functional resource selection models, which work on the observed distribution of a species as a function of environmental conditions.

These are the SDMs originated as corelative models. These models observe the distribution of any species in any specific region, using climatic prediction variable, which is geographically referenced and it uses multiple regression approaches. If a set of species and climate maps in a specific geographical region is provided, then this algorithm finds the most likely environmental ranges within which a particular species lives. This SDM assumes that all the species are at equilibrium in their environment and all the relevant environmental variables have been sampled accordingly. The only disadvantage is it allows for interpolation within a limited number of species.

Mechanistic Species Distribution Modeling

Mechanistic SDMs , also known as process-based models, which uses independently derived (mainly physiological) information about any particular species using some environmental variables, under which that species can live. These are one of latest and most advanced SDM methodology.

Implementation approach:

In this case we will map the geographic distribution of two South American mammals based on given past observations and 14 environmental variables (some direct affect the lifecycle of mammals and some do not). Since we have only positive examples (there are no unsuccessful observations in the training part, we will use), we cast this problem as a density estimation problem and use the OneClassSVM method provided by the packagescikits.learn.svmas in our modeling tool.

The dataset is provided by Phillips et. al. (2006). In most of the cases, the example uses base map to plot the coast lines and national boundaries of South America.

Species used in this dataset:

Bradypus Variegatus : The brown throated sloth. Microryzomys Minutus : Also known as forest small rice rat, a rodent that lives in Peru, Colombia, Ecuador, Venejuala

khusheekapoor commented 1 year ago

@sujanrupu - you can go ahead! We are assigning you 21 days for this project, after which it will be assigned to someone else if not completed. All the best! Name the file as: algorithm_dataset.ipynb and link it in the readme of the labeled directory as algorithm - dataset.