Greenstand / Greenstand-Overview

Tree Tracking Fighting Poverty and Climate Change - This repository contains Contributing, Project Overview, Roadmap, etc
https://www.greenstand.org
GNU Affero General Public License v3.0
45 stars 11 forks source link

Capture Matching Function - Possible AI supported outputs #157

Open Davidezrajay opened 11 months ago

Davidezrajay commented 11 months ago

Capture Matching Function - Possible AI supported outputs

Challenge Brief:

Captures as images of trees and other data are collected with smartphone apps and are used to verify environmental work. Users often return to the same plants/trees over time resulting in "layers of images" of the same tree / same location. These images must be linked together.

Current 'capture matching' is done with a front-end react web app and backed by a RESTful API. This 'capture matching administration panel' is how users match them manually; the interface is supplied with images from an API and displayed based on GPS coordinates/distances and other filters that the user sets, including time range and organization. If several matches are found, the capture matching system displays the GPS-related images in order of distance. These captures are then matched by an operator. The process is slow, has room for improvement accuracy, and requires a level of automation at scale.

Besides being prone to error, the current operation doesn't account for data related to identified species, leaf morphology, trees already capture matched, species, track files, tracking seasons and other attributes.

The goal of this ticket is to identify and test methods and solutions that augment, verify, or replace the current capture matching process.

How to contribute

Just go for it and solve it.

If you get stuck, you can ask questions on this ticket, via Greenstand's Slack, or by emailing the ticket contact below. (join on slack and introduce yourself in the community_intro channel. From there you will be invited to the project specific channels)

Note: IF you believe there is insufficient information or infrastructure provided to solve this critical issue, please reach out.

Deliverable:

  1. Any integration improvement to the Greenstand stack as a pull request to the appropriate Greenstand repository as a script/airflow function that feeds the user interfaces.
  2. Any integration improvement to the Greenstand stack as a service based on current API functions that "pre-tags" captures.
  3. An Open and lead ADR on recommendations to change or improve a process, data collection etc. Note: Viable data collection recommendations cannot increase workload for users or apps.

Full Challenge Narrative:

The underlying value proposition on the Greenstand Token Model is the ability for individuals to earn and trade tokens linked to work surround ecological restoration, which is often based on the growth of plants or trees over time. The issue of identifying repeating captures / visits to individual trees is critical to the success of many projects using the model which encourages the re-tracking of trees to document maintenance, tree health and growth over time as a means of employment, poverty alleviation and ground verification around successfully implementing carbon and reforestation projects.

Solving this challenge will:

Each capture contains a geo tagged image collected from a mobile app. It enters the Greenstand system and is tagged with various attributes (such as species) using a number of different microservices and manual operators. The first time a capture is captured is unique to the location and context. However, a re-tracked tree creates data points that are similar to the initial tree capture.

Users tend to double track trees, intentionally or unintentionally, in single tracking sessions, or at later dates, or multiple users overlap their tracking at different times, especially when implementing larger operations (hundreds or thousands of trees.)

GPS inaccuracy is an issue. Most user phones are cheap models and limited in their ability to pinpoint locations and many trees are often collected within the "area of GPS error." The GPS data alone is not accurate enough to match the images. Trees are often planted a meter or less apart, while GPS accuracy is often 10 meters or more.

Related operational issues

  1. Trees die and are often replaced, in the same geo location.
  2. Users and tree growers are incentivized to take duplicate images of trees and some have tried to scam the system by taking multiple images of the same plant from various directions.
  3. User and phone specific data is not considered in this as the same user or different users returns to the same cluster of trees at undefined times.
  4. GPS accuracy radius overlaps multiple trees possibilities.
    1. Physical tree tags and RFID tags have been tried and ruled out as not a scalable option for our users.

Solutions:

It is not expected to have a single solution to 100% solve this challenge, rather a solution is expected to be built by many incremental improvements and tools added to the process from different sources.

Possible solutions:

GPS coordinate accuracy enhancement, using filtering algorithms. Object recognition coupled with GPS to link trees across the maintenance period. ML image verification

  1. "pre match" as many captures as possible before showing them to users. and put in place a machine learning process that will
  2. Create algorithms that automatically match the captures.
  3. Utilize other layers of data - track files, species data.
  4. Scrub data priory to evaluation (adjusting inaccurate GPS data)
  5. Statistically match based on total number of possibilities.
  6. Use images based attributes to match. (such as background rocks and unique environmental attributes)
  7. Redesign the UX of the capture matching tool in the admin panel
  8. Enhancements to GPS accuracy (see issue)

Supporting ideas include:

Barriers to completion:

Resources:

Links

Suggested data sets

There are data sets of trees with repeat captures with sticks painted with colored stripes (In Haiti) which can provide an extra layer of support creating curated sets.

The Freetown City data has been mostly manually matched (although there has not been much quality control on that data set)

Greenstand respects our users privacy. For more data needs, please contact the issue lead and articulate why you need it and be prepared to provide a government issued id and sign a legally binding data privacy policy.

Related Projects/tools:

Related Issues: https://github.com/Greenstand/treetracker-admin-client/issues/568 https://github.com/Greenstand/treetracker-admin-client/issues/1029 https://github.com/Greenstand/treetracker-admin-client/issues/949 https://github.com/Greenstand/treetracker-admin-client/issues/781 https://github.com/Greenstand/treetracker-admin-client/issues/568 Https://github.com/Greenstand/Greenstand-Overview/issues/54 Https://github.com/Greenstand/Greenstand-Overview/issues/52 Https://github.com/Greenstand/treetracker-android#197 https://github.com/Greenstand/Greenstand-Overview/issues/75

Contacts on this issue

Primary: Xinyi Hu xinyi.hug@reenstand.org

Secondary: Info (at) Greenstand.org

To do:

ahs0katan0 commented 10 months ago

A couple of questions from the ML team.

One approach @shubhomb pointed out was to have humans match the captures at large scale (to create the training data), we may be able to use that dataset to train an automated algorithm for it. In the absence of that data, the algo can only generate a defined list of recommendations to match.

https://github.com/Greenstand/Greenstand-Overview/issues/54 Does it show the probability of duplicate images based on timestamp and location sequence? If this exists, then it holds more promise as those images will be grouped by tree>>user.

ahs0katan0 commented 10 months ago

Update from Ezra - The duplicate issue has never been solved, only the release of the admin panel (duplicate and capture matching are the same thing in his mind) The trained set is a viable idea, and we have some data for it from Free town and from Haiti. (Although our experience with this last time was a trained professional is required

ahs0katan0 commented 6 months ago

Further discussion with @shubhomb indicated that more data is needed to validate the system that will be created. Shubhom examined the CSV linked and is exploring creating a simulator to approximate the planter movement. However his assessment is that the data may not be adequate.