Open fearghalodonncha opened 2 years ago
@UConnAI
I think a dedicated ML repository should be created at this point so the team can fork it and start work @Gaurav-Ramakrishna . I have created two teams for this: One focused solely on Data Engineering, another one focused on Model Development
This is a good suggestion @Gaurav-Ramakrishna . We have some related research ML code in the IBM org for reinforcement learning and soil moisture.
For deployment models we should develop in this org so we can guide cloud and edge deployment
I like the idea and happy to create a repo within Liquid Prep Git Org. Any name suggestions for the repo @fearghalodonncha ?
Thanks @Gaurav-Ramakrishna . Fine with any name really. What about liquid-ml to keep the liquidprep style name?
Created the LiquidPrep-ML repo
@fearghalodonncha @Nachiket18 had an idea to use TimeScaleDB to store the time series data. He has previously written an article about it here: https://medium.com/dataengineering-and-algorithms/timescaledb-an-introduction-to-time-series-databases-3438d275e88e
Great blog post @Nachiket18 @charitarthchugh. Well done!
@playground @Gaurav-Ramakrishna what are we using to store LiquidPrep data? Thoughts on TimeScaleDB?
@Nachiket18 @charitarthchugh apologies for delay on coming back on this. I like the proposed approach. Please go ahead. I'll be very interested in the experience.
@charitarthchugh, @Nachiket18 wanted to loop back on this. Have you thoughts on potential ML approaches you would like to explore?
@fearghalodonncha - I was thinking about the data model and I would like some clarification. Right now, we created a code to fetch the data from Texas mesonet. 1. How many such sources are needed 2. What would be the common schema for all collected data source (I can do the data modelling). Once we discus about this we can design a data pipeline to write the data into time series database. I will take a look at ML models and the paper so that I can understand that part as well.
Hi @Nachiket18 , Texas mesonet is the main data source we will focus on for soil moisture ML development. We aim to develop ML predictors for soil moisture.
subsequently we will explore specific crop irrigation requirements based on this database. That will have a different data model though I expect. For the soil moisture we'll focus on a pretty classic model of [timestamp, latitude, longitude, depth, soil_moisture_value]
Thoughts about this library for soil moisture prediction @charitarthchugh @Nachiket18 ?
Sorry for the delayed response. I was very much occupied. I think the library seems good. @fearghalodonncha
Hello everyone; I have recently joined the project. I will be more than happy if I could be of any help and contribution in data part of the workload. Thanks everyone
Liquid prep will be significantly enhanced with a library of machine learning models to convert observations to forecast.
We can train these models on long-term data from Texas mesonet network described in #181. Initially, we can inform with weather data from the sensors themselves.
Ideally, we would like the ability to integrate data from multiple sources (The Weather Company, NOAA, Google Earth Engine), etc. The exact variables we use as features will depend on the specific data available from the given weather repository, but broadly it will include precipitation related data (precipitation, snowfall, snowmelt, etc.), temperature related (air temperature, solar radiation, humidity, etc.).
The desired objective is the ability to specify a sensor (sensor_id), extract latitude and longitude (described in issue #181), download weather data from pertinent source (above), train a machine learning model, evaluate robustness (while accuracy is important, the consistency or robustness of prediction is more critical), and deploy to IBM cloud (eventually this can be published to the edge).