ADAPT / Standard

ADAPT Standard data model issue management
https://adaptstandard.org
MIT License
7 stars 1 forks source link

Store SpatialRecord in Section Polygons #74

Open zwing99 opened 1 year ago

zwing99 commented 1 year ago

I humbly propose we consider changing how Spatial Records are stored in ADAPT. Out of tradition, we have stored these as sets of data correlated to offsets from the GPS sensor reading. So we only keep the GPS sensor at the sampling frequency and then use the offsets to calculate where to draw the "point" and then use additional metadata like sensor width and speed to construct a "covering" polygon for that sensor reading. This way of storing things has the benefit of being very space efficient and allowing the reader to correct errors in producing that data, as others like @strhea have pointed out to me.

That said, I think the benefits do not out weight the costs. First and foremost, this is "tricky" math whose interpretation of how to do it can be both OEM and FIS-specific. This data storage technique leads to many conditions in the code that adapt processes from different OEMs. Secondly, it leaves little accountability to who is right in interpreting the data when the standard is inconsistently filled.

I propose we take a leaf (pun intended) from Deere and us at Corteva, who have internally and independently developed a very similar format to Deere, and store the Spatial Records as section polygons. These "new" Spatial Records would have the benefit of being transparent. What I mean to say is that there is no interpretation error, and the polygon drawn covers the appropriate area. If there is a gap between polygons or overlap, that should be considered "correct" or "mishandled" by the OEM, but not something responsible for the consuming party to correct. While this might feel like a "loss of control," it enables an environment of clear accountability to the ADAPT standard. If it is not right, it is not right and should be obvious when the polygons are drawn in GIS tooling. The other significant advantage of this is that it lowers the bar of entry for FIS systems to become users of machine data, which in turn creates healthy economic competition to deliver the value that your FIS system can do with good data, not whether or not it can read the data correctly.

I expect this proposal will ruffle some feathers in with folks, and I hope I can spark a lively discussion by prosing it. I am not 100% sold on it, but after careful consideration, I think it would be a monumental step forward for ADAPT and FIS systems to share data.

knelson-farmbeltnorth commented 1 year ago

Thanks @zwing99 for writing up the proposal. As we've discussed, I think it has lots of merits. Adding some notes for discussion in no particular order.

-Foremost, it removes a lot of potential for variation in data modeling. One of the largest challenges with ADAPT today (following from ISOXML) is that it provides the data provider multiple ways to model the data in the interest of staying true to the source. The net effect is that the burden of transforming the data falls on the data consumer, who still needs to anticipate and handle all the variations, often with conditional logic based on data provider. There is much less burden all around if it is the data provider who makes transformation decisions with their own data.

-Removing much of the implement modeling from ADAPT removes ambiguity about the use and purpose of ADAPT vs. ISO11783-10. We've always stated that ADAPT is an FMIS-centric model vs. ISO11783 as a machine-centric model, but we've maintained a lot of the machine modeling due to the state of machine data in the early days of ADAPT.

-As we were spinning up this standardization effort, I recall making the point that ADAPT in its current form risked obsolescence from the processed formats that OEMs were beginning to serve from their cloud APIs. One key goal of the serialization effort was to be to try to head off a dozen different formats of processed data. While my initial thinking was that we would allow modeling processed polygons by simply stubbing out a machine and providing a polygon instead of a point (SpatialRecord has a polymorphic Geometry property), removing the variability makes things cleaner.

-The impact to storage size here is significant. This is probably the biggest challenge in making this change. Ever-increasing file size, after all, is one of problems we were trying to solve.

zwing99 commented 1 year ago

GREAT summary @knelson-farmbeltnorth of concerns with this proposal! One thing to consider on the last concern of space is that we have options to "compact" the data for transfer. They WILL NEVER be as ideal as the current ISO and ADAPT model but I think they can fill the gap enough to gain the benefits that this proposal receives. I have no hard evidence for this statement but should be testable.

knelson-farmbeltnorth commented 1 year ago

Open questions for discussion/review in the modeling effort (as opposed to the Serialization Work Group) are:

  1. Do we allow implementers to model data either in points or polygons? While Field Operations data points may be representable as coverage polygons, does that work for all types of data we wish to represent in ADAPT ?

  2. How much of the vehicle/machine/implement do we want to maintain in the ADAPT model?

knelson-farmbeltnorth commented 1 year ago

@strhea asks a 3rd question: How to handle data logged at different, overlapping levels. E.g., data on the implement and data on the row.

knelson-farmbeltnorth commented 1 year ago

Discussion in 4 January 2023 meeting:

Re question 1 above, there is agreement that use of points or polygons will depend on the context, but only 1 is acceptable for any use case. E.g., reporting seeding, application, harvest, etc. will necessarily be done via polygons. We will be removing data constructs to specify device offsets so there will be no viable way to present this data as points. For tractor telematics data, soil samples, and most other observations, points should be used.

Question 2 remains open for ongoing discussion.

There was agreement re the 3rd question that ADAPT can present only one level of detail at a time. E.g., a data producer may choose to report planting section data. If so, any data points recorded at a higher level across the implement will need to be copied to each section. Alternatively, the data producer may report data as a single polygon for the width of the implement, and in this case lower-level data will need to be averaged/summarized across all sections. Since ADAPT intends to a be a data transfer format, and not an exhaustive storage format, there is agreement that the data fidelity and specific contents of any ADAPT dataset are defined by each data producer.

knelson-farmbeltnorth commented 1 year ago

With the resolution of question 2 in #76 & #77, this item is resolved. To recap:

  1. Will ADAPT store points or polygons? It depends on the type of data reported. Data representing a covered area is expected to be reported as polygons.
  2. How much of the implement data do we maintain? See #77
  3. Do we allow overlapping polygons for different data resolutions? No, the spatial data in any one ADAPT polygon dataset will contain all data in a single layer of geometries at the same resolution. If the implementer and consumer do not wish to summarize lower level or duplicate higher level data on the geometries, the data must be reported in separate datasets.
knelson-farmbeltnorth commented 1 year ago

Reviewed and agreed in 8 Feb 2023 meeting