brahmwg / Bottlenecks_MDS_Capstone

Master of Data Science Capstone Project for Bottlenecks to Survival
0 stars 0 forks source link

report corrections #124

Closed riyaeliza123 closed 2 months ago

riyaeliza123 commented 2 months ago

• Executive summary:

Our collaboration with the Pacific Salmon Foundation on the Bottleneck project will deliver visual analysis tools and statistical models to enhance biologists' understanding of salmon survival trends. By leveraging techniques from the Master of Data Science program, we aim to deepen our understanding of salmon survival probability, considering factors like predation and body size. This project is crucial due to the critical decline in adult salmon fish populations. Our work will provide biologists with the tools to make informed decisions and implement targeted conservation efforts, supporting the long-term sustainability of salmon populations and ecosystems.

• (“Understanding Evolution”, 2024)

• REMOVE 2.1

• 3.0 Data Products

For this project, our goal is to develop four key models. First, a survival analysis model based on Bayesian modeling will be created to understand the survivability of salmon fish at various life stages. Second, an outmigration model will be developed to predict and understand the outmigration patterns primarily of Chinook and Coho salmon. Third, a species imputation model will be implemented using machine learning to fill in missing or mislabeled species data for fish. Lastly, a species prediction model will be designed to predict the species of a fish in instances where there is ambiguity during fieldwork.

• 3.3.1.b “annotated_species” and “confirmed_species” • 3.3.1.c In preprocessing the data, several steps were taken to prepare it for modeling. First, any null values present in the dataset were removed. Next, two new features were extracted from the date column: the day of the year, represented as a whole number between 1 and 365, and the year, which ranged from 2021 to 2023. The tag_id column, being unique to each entry, was removed from the dataset. Additionally, standard scaling was applied to the fork length and day of the year features to normalize their values, and the remaining categorical features were one-hot encoded to convert them into a format suitable for ingestion into a machine learning model. • 3.3.1.d Define epoch in “20 epochs” An epoch is one complete pass through the entire training dataset during machine learning model training.

• 4. Conclusion and Future Recommendations: (remove future rec section) • 3.3.1 Species Imputation model (change title) • 3.3.2 Species Prediction model (change title) • 3.3.1 – typo – “done” • 3.3.2 : o correct all the information using this https://github.com/brahmwg/Bottlenecks_MDS_Capstone/blob/main/mds_deliverables/species_prediction_model/documentation.md o Add all required images • ADD IMAGE for SA preprocessing