Open Davidezrajay opened 1 year ago
A couple of questions from the ML team.
One approach @shubhomb pointed out was to have humans match the captures at large scale (to create the training data), we may be able to use that dataset to train an automated algorithm for it. In the absence of that data, the algo can only generate a defined list of recommendations to match.
https://github.com/Greenstand/Greenstand-Overview/issues/54 Does it show the probability of duplicate images based on timestamp and location sequence? If this exists, then it holds more promise as those images will be grouped by tree>>user.
Update from Ezra - The duplicate issue has never been solved, only the release of the admin panel (duplicate and capture matching are the same thing in his mind) The trained set is a viable idea, and we have some data for it from Free town and from Haiti. (Although our experience with this last time was a trained professional is required
Further discussion with @shubhomb indicated that more data is needed to validate the system that will be created. Shubhom examined the CSV linked and is exploring creating a simulator to approximate the planter movement. However his assessment is that the data may not be adequate.
Capture Matching Function - Possible AI supported outputs
Challenge Brief:
Captures as images of trees and other data are collected with smartphone apps and are used to verify environmental work. Users often return to the same plants/trees over time resulting in "layers of images" of the same tree / same location. These images must be linked together.
Current 'capture matching' is done with a front-end react web app and backed by a RESTful API. This 'capture matching administration panel' is how users match them manually; the interface is supplied with images from an API and displayed based on GPS coordinates/distances and other filters that the user sets, including time range and organization. If several matches are found, the capture matching system displays the GPS-related images in order of distance. These captures are then matched by an operator. The process is slow, has room for improvement accuracy, and requires a level of automation at scale.
Besides being prone to error, the current operation doesn't account for data related to identified species, leaf morphology, trees already capture matched, species, track files, tracking seasons and other attributes.
The goal of this ticket is to identify and test methods and solutions that augment, verify, or replace the current capture matching process.
How to contribute
Just go for it and solve it.
If you get stuck, you can ask questions on this ticket, via Greenstand's Slack, or by emailing the ticket contact below. (join on slack and introduce yourself in the community_intro channel. From there you will be invited to the project specific channels)
Note: IF you believe there is insufficient information or infrastructure provided to solve this critical issue, please reach out.
Deliverable:
Full Challenge Narrative:
The underlying value proposition on the Greenstand Token Model is the ability for individuals to earn and trade tokens linked to work surround ecological restoration, which is often based on the growth of plants or trees over time. The issue of identifying repeating captures / visits to individual trees is critical to the success of many projects using the model which encourages the re-tracking of trees to document maintenance, tree health and growth over time as a means of employment, poverty alleviation and ground verification around successfully implementing carbon and reforestation projects.
Solving this challenge will:
Each capture contains a geo tagged image collected from a mobile app. It enters the Greenstand system and is tagged with various attributes (such as species) using a number of different microservices and manual operators. The first time a capture is captured is unique to the location and context. However, a re-tracked tree creates data points that are similar to the initial tree capture.
Users tend to double track trees, intentionally or unintentionally, in single tracking sessions, or at later dates, or multiple users overlap their tracking at different times, especially when implementing larger operations (hundreds or thousands of trees.)
GPS inaccuracy is an issue. Most user phones are cheap models and limited in their ability to pinpoint locations and many trees are often collected within the "area of GPS error." The GPS data alone is not accurate enough to match the images. Trees are often planted a meter or less apart, while GPS accuracy is often 10 meters or more.
Related operational issues
Solutions:
It is not expected to have a single solution to 100% solve this challenge, rather a solution is expected to be built by many incremental improvements and tools added to the process from different sources.
Possible solutions:
GPS coordinate accuracy enhancement, using filtering algorithms. Object recognition coupled with GPS to link trees across the maintenance period. ML image verification
Supporting ideas include:
Barriers to completion:
Resources:
Links
Data Resources
Suggested data sets
There are data sets of trees with repeat captures with sticks painted with colored stripes (In Haiti) which can provide an extra layer of support creating curated sets.
The Freetown City data has been mostly manually matched (although there has not been much quality control on that data set)
Greenstand respects our users privacy. For more data needs, please contact the issue lead and articulate why you need it and be prepared to provide a government issued id and sign a legally binding data privacy policy.
Related Projects/tools:
Related Issues: https://github.com/Greenstand/treetracker-admin-client/issues/568 https://github.com/Greenstand/treetracker-admin-client/issues/1029 https://github.com/Greenstand/treetracker-admin-client/issues/949 https://github.com/Greenstand/treetracker-admin-client/issues/781 https://github.com/Greenstand/treetracker-admin-client/issues/568 Https://github.com/Greenstand/Greenstand-Overview/issues/54 Https://github.com/Greenstand/Greenstand-Overview/issues/52 Https://github.com/Greenstand/treetracker-android#197 https://github.com/Greenstand/Greenstand-Overview/issues/75
Contacts on this issue
Primary: Xinyi Hu xinyi.hug@reenstand.org
Secondary: Info (at) Greenstand.org
To do: