Open roll opened 6 months ago
The distinction between physical representation and logical representation is known under many names, e.g. lexical space vs. value space in XSD. Any name may be confusing without explanation. The current form is ok but it might be better to switch to other names. In this case I'd also change "representation" because "representation of data" is confusing as well. My current best suggestion is to use lexical value and logical value instead of physical representation and logical representation.
The current spec also uses "physical contents", this should be changed as well.
Thanks @nichtich, I agree
I think, currently, confusion might occur because physical
implies being textual
:
The physical representation of data refers to the representation of data as text on disk
Although, in general, I guess for majority of people physical
regarding data storage means something different
BTW lexical
is already actually used in the spec - http://localhost:8080/specifications/table-schema/#number
The lexical formatting follows that of decimal in XMLSchema
This sentence I think is very easy to understand so I guess lexical
is a good choice
Hmm, I think a danger of replacing physical
with lexical
or textual
here is that a given logical
value can have many different lexical
/ textual
representations... The textual
representation of a date is an easy example. What we're wanting to refer to here is specifically the particular lexical
/textual
form being stored in the actual source data file.
So I actually prefer the current term physical
here for that reason, provided we repeatedly emphasize that physical here implies textual as @roll noted.
Although reading through the standards again I'm also now realizing that's not quite the case because we're allowing type info to be associated with JSON source data... so it's actually not purely textual/lexical in a strict sense, which complicates things. Does this mean we throw an error or warn if a numeric field finds numeric values as strings (e.g. "0", "1", "2") in JSON source data? What if a string field schema gets numeric values? etc.
It'd simplify these cases if all "raw" data was just guaranteed to be parsed by the field schema as pure lexical/textual/string, and field props referencing physical
values always used strings. If we're including / allowing type info other than string to come from the underlying source data representation, I may reconsider my position on #621, because it makes a case for props referencing physical
values be allowed to be any JSON type.
In the spirit of brainstorming to get more ideas flowing:
Other possible terms for physical
, lexical
, textual
value: raw value
, source value
, underlying value
...
Other possible terms for logical
value: typed value
, parsed value
, conceptual value
, ... (I actually like the term conceptual value
quite a bit; logical
has always sounded like a boolean to me...)
Ok, the issue needs more. The whole section on Concepts needs to be rewritten to better clarify what is meant by "tabular data". Because we also have two levels of description:
There are "raw" tabular data formats (TSV/CSV) and there are tabular data formats with typed values (Excel, SQL, JSON, Parquet... limited to non-nested cells...). I'd say a Table Schema only refers to the former. A SQL Table can be converted to a raw table (just export as CSV) plus a Table Schema (inferred from the SQL Table definition) but SQL Tables are not directly described by Table Schema, nor is any JSON data as wrongly exeplified in the current specification.
There are "raw" tabular data formats (TSV/CSV) and there are tabular data formats with typed values (Excel, SQL, JSON, Parquet... limited to non-nested cells...). I'd say a Table Schema only refers to the former.
Agreed!
Perhaps it would clear some of the confusion if we renamed "Table Schema" to "Textual Table Schema" or "Delimited Table Schema" to reflect that the schema definition is specifically designed for textual data.
It would also pave the way for future frictionless table schema standards for other types of physical data, e.g. "JSON Table schema", "Excel Table Schema", "SQL Table Schema", which would be designed around the particularities of the types found in those formats.
In that case, we'd have:
The physical values of Textual Table Schema are all strings The physical values of JSON Table Schemas are all JSON data types The physical values of Excel Table Schemas are all Excel data types etc.
As you say, it's much easier to think about conversions between formats, rather than type coercions if we try to use a textual table schema to parse an excel file, for example. The latter has a lot of potential complexity / ambiguity.
Although reading through the standards again I'm also now realizing that's not quite the case because we're allowing type info to be associated with JSON source data... so it's actually not purely textual/lexical in a strict sense, which complicates things. Does this mean we throw an error or warn if a numeric field finds numeric values as strings (e.g. "0", "1", "2") in JSON source data? What if a string field schema gets numeric values? etc.
In frictionless-py
:
The conversation is happening here so I'm adding @pwalsh's comment:
@nichtich @roll the original terminology seems pretty standard, eg
https://aws.amazon.com/compare/the-difference-between-logical-and-physical-data-model/
https://www.gooddata.com/blog/physical-vs-logical-data-model/
Whereas I have never come across using "lexical" to represent what is called "physical" in the current terminology.
I read https://github.com/frictionlessdata/specs/issues/864 but honestly physical vs logical seems the most common terminology for describing this and I am not sure I see a good reason to change it.
First of all, probably I did not understand it correctly but I never thought about physical
and logical
in terms described here - https://www.gooddata.com/blog/physical-vs-logical-data-model/. I was thinking that in the case of Table Schema we're talking about basically a data source (like 1010101 on the disc or so-called text in csv) and data target (native programming types like in python and SQL).
So my understanding is that every tabular data resource has a physical data representation (in my understanding of this term). On current computers, it's always just a binary that can be decoded to text in the CSV case or just read "somehow" in case of a non-textual format e.g Parquet. For every format there is a corresponding reader that converts that physical representation to a logic representation (e.g. a pandas dataframe from a csv or parquet file).
I think here it's important to note that the Table Schema implementors never deal with any physical data representation (again based on my understanding of this term). Table Schema doesn't declare rules for csv parsers or parquet readers. In my opinion, Table Schema actually declared only post-processing rules for data that is already in its logical form (read by native readers).
Physical Data -> [ native reader ] -> Source Logical Data -> [ table schema processor ] -> Target Logical Data
For example, for this JSON cell 2000-01-01
:
Another note, that from a implementor perspective, as said we only have access to Source Logical Data. It means that the only differentiable parameter for a data value is an source logical data type. For example, a Table Schema implementation can parse 2000-01-01
string for a date
field because it knows an input logical type and a desired logical type. There is no access to underlying physical representation to have more information about this value. We only see that the input is string
. For example, frictionless-py
differentiates all the input values into two groups:
So for me it feels that Table Schema's level of abstraction is to provide rules for processing "not typed" string values (lexical representation) and that's basically the only thing this spec really can define while low-level reading can't be really covered. So my point is that physical
is not a wrong term or whatever but that we really need to describe parsing lexical values e.g. for dates or missing values rather talking about physical
.
cc @peterdesmet
I tend to agree that we actually have 3 states of data in the spec, as you write.
A few notes, though: 1 - you write "Table Schema doesn't declare rules for csv parsers". However, the data package spec does have a csv dialect section and a character encoding setting, which are precisely rules for csv parsers that interact with the physical layer. 2 - 'source logical data' and 'target logical data' are not great names imo as they impose some sort of order between the layers (source and target) which does not apply in many cases (e.g. when writing a data package).
So, I would suggest to follow your lead, and use
On Thu, Jan 25, 2024 at 4:24 PM roll @.***> wrote:
First of all, probably I did not understand it correctly but I never thought about physical and logical in terms described here - https://www.gooddata.com/blog/physical-vs-logical-data-model/. I was thinking that in the case of Table Schema we're talking about basically a data source (like 1010101 on the disc or so-called text in csv) and data target (native programming types like in python and SQL).
So my understanding is that every tabular data resource has a physical data representation (in my understanding of this term). On current computers, it's always just a binary that can be decoded to text in the CSV case or just read "somehow" in case of a non-textual format e.g Parquet. For every format there is a corresponding reader that converts that physical representation to a logic representation (e.g. a pandas dataframe from a csv or parquet file).
I think here it's important to note that the Table Schema implementors never deal with any physical data representation (again based on my understanding of this term). Table Schema doesn't declare rules for csv parsers or parquet readers. In my opinion, Table Schema actually declared only post-processing rules for data that is already in its logical form (read by native readers).
Physical Data -> [ native reader ] -> Source Logical Data -> [ table schema processor ] -> Target Logical Data
For example, for this JSON cell 2000-01-01:
- physical data -- binary
- source logical data -- string
- target logical data -- date (the point where Table Schema adds its value)
Another note, that from a implementor perspective, as said we only have access to Source Logical Data. It means that the only differentiable parameter for a data value is an input logical data type. For example, a Table Schema implementation can parse 2000-01-01 string for a date field because it knows an input logical type and a desired logical type. There is no access to underlying physical representation to have more information about this value. We only see that the input is string. For example, frictionless-py differentiates all the input values into two groups:
- string -> process
- others -> don't process
So for me it feels that Table Schema's level of abstraction is to provide rules for processing "not typed" string values (lexical representation) and that's basically the only thing this spec really can define while low-level reading can't be really covered
cc @peterdesmet https://github.com/peterdesmet
— Reply to this email directly, view it on GitHub https://github.com/frictionlessdata/specs/issues/864#issuecomment-1910316936, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACAY5NUKZH4VDEZHTGRV3TYQJTJ3AVCNFSM6AAAAABBLUSTP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJQGMYTMOJTGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi @nichtich,
Are you interested in working on the updated version of https://github.com/frictionlessdata/datapackage/pull/17 that incorporates comments from this issue?
After working closely with the specs last month and refreshing in my memory implementation details from frictionless-py
I came to the conclusion that we actually don't have a very complex problem here.
For example, for a JSON data file like this:
[
["id", "date"],
[1, "2012-01-01"]
]
We have:
1
is already a logical value, and 2012-01-01
is still a lexical value1
and Date('2012-01-01')
I think this tiering is applicable to basically any input data source from csv
to parquet
or sql
.
I guess we need to rename the section to something like Data Processing
and mention this workflow. Although, we have 3 tiers I would personally focus the explanation on lexically represented cells
because basically all Table Schema data type descriptions is about of how to parse lexically represented data e.g. date/times, objects, arrays, numbers (basically all the types).
I guess we need to rename the section to something like Data Processing and mention this workflow.
Yes. I'd like to provide an update but I don't know when so it's also ok for me if you come up with an update. To quickly rephrase your words:
We have three levels of data processing:
Table Schema specification defines how to map from level 2 to level 3.
Table Schema specification defines how to map from level 2 to level 3.
I think it's a good wording!
Yes. I'd like to provide an update but I don't know when so it's also ok for me if you come up with an update.
Of course, no hurry at all. Let's just self-assign ourselfes to this issue if one of us decide start working (currently, I also have other issue to deal with first)
I agree but I have an observation here -
In @roll's example, it's mentioned that '1' is already a logical value.
I would claim that it's still a native value - a JSON number with the value of 1. It might represent a table schema value of type integer, number, year, or even boolean (with trueValues=[1]). It might also be converted to None, e.g. in case missingValues=[1].
Therefore I would say that the distinction between native and logical is correct and that all values start out as native values and get processed, casted and validated into logical values - even if they come from a more developed file format such as JSON. Then, in each case we require a value to be present in the descriptor (e.g. in a max constraints, booleans trueValues of missingValues) we need to specify whether a native value or a logical value is expected there.
On Wed, Feb 21, 2024 at 1:46 PM roll @.***> wrote:
Table Schema specification defines how to map from level 2 to level 3.
I think it's a good wording!
Yes. I'd like to provide an update but I don't know when so it's also ok for me if you come up with an update.
Of course, no hurry at all. Let's just self-assign ourselfes to this issue if one of us decide start working (currently, I also have other issue to deal with first)
— Reply to this email directly, view it on GitHub https://github.com/frictionlessdata/specs/issues/864#issuecomment-1956477045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACAY5POT5AGUWG4WD4M5BDYUXNCFAVCNFSM6AAAAABBLUSTP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJWGQ3TOMBUGU . You are receiving this because you commented.Message ID: @.***>
It might also be converted to None, e.g. in case missingValues=[1].
Currently, it cannot because missingValues
items in v1 have to be strings. So basically, I think we found the root cause and the real decision to make (related to #621 as well) what is our data model:
I guess (2) might be cleaner and easier to explain. In this case it will be something like this e.g. for datetime
:
datetime: if on the native-data level a value is represented lexically than it MUST be in a form defined by XML Schema containing required date and time parts, followed by optional milliseconds and timezone parts
Therefore I would say that the distinction between native and logical is correct and that all values start out as native values and get processed, casted and validated into logical values
Good to introduce "native" as description of values before the logical level. A native boolean false
from JSON or SQL may end up as logical boolean value false
, logical string "false"
, or logical missing value.
(2) physical/native/logical -- Table Schema processes all the native values.
All native values either have type that directly maps to a logical type (e.g. JSON Boolean and SQL BOOL both map to logical boolean value) or they are treated as strings.
datetime: if on the native-data level a value is represented lexically than it MUST be in a form defined by XML Schema containing required date and time parts, followed by optional milliseconds and timezone parts
Yes except replace "is represented lexically" with "is represented as string". If the native-data level already has a type compatible with datetype, no lexcial representation is involved at all.
I think we are all on the same track but use slightly different terminology for the same idea.
I like the direction :)
If we lean towards 3 distinct layers (physical/native/logical) as an implementor I'm curious what will be the behaviour for this resource for example:
data:
- [id]
- [1]
- [2]
- [3]
schema:
fields:
name: id
type: string
Will it be considered valid data, and values will be coerced to strings? Currently, frictionless-py
will raise 3 validation errors as the number type is not compatible to the string type.
Also, I think it's important to check what dataframe parsers (readr/pandas/polars/etc) do in this case so we don't end up with non-implementable solution
I like where this is going too, it's really clarifying the decision at hand:
a) do we parse fields with a 2-layer physical / logical distinction or
b) do we parse fields with a 3-layer physical / native / logical distinction
The spec is currently written / defined as (a) a 2-layer scheme. This is why missingValues
is string[]
, and why trueValues/falseValues
are string[]
: everything should hit the TableSchema as a string
physical value type, no matter its native origin type. I think this fits with TableSchema being billed as a description of textual data. (I realize the way the implementation currently handles JSON data is inconsistent with this; I'm referring to the broad intent of the spec in my reading here).
Supporting JSON in the data
field throws a wrench into the works for the 2-layer approach, because it has its own native type definitions. Now, this could be resolved by just reading each JSON element as string and ignoring the native type info. But retaining JSON type info requires the 3-layer distinction.
An advantage of the 3-layer distinction is that in addition to JSON, it allows us to consider other intermediate typed sources (like SQL, ODF, etc), rather than being forced to convert all of the native types to string
before reaching the TableSchema.
The disadvantage of the 3-layer distinction is that I think it opens a can of worms of complexity. With 2 layers, we only have to define our Fields
parsers as mappings from string
-> FieldType
. But with 3 layers the TableSchema would need the capability to define mappings / validation rules from all possible JSONType
-> FieldType
, SQLType
-> FieldType
, ODFType
-> FieldType
etc., depending on the native type being used.
Furthermore, with 3 layers we also need a way to losslessly represent native values in the TableSchema. For JSON types, this is easy, because the spec is JSON. But if we're envisioning support for other native types, we'd need ways to represent their native values in JSON. As @akariv said:
in each case we require a value to be present in the descriptor (e.g. in a max constraints, booleans trueValues of missingValues) we need to specify whether a native value or a logical value is expected there.
This is also apparent in the issue @roll describes re: numeric data. A JSON number
is not an exact type (like SQL's DECIMAL
), but a string representation of a number is an exact decimal
type. With 3-layer parsing, a numeric field parser has to think about validation for both exact and non-exact inputs, but with 2-layer parsing we can always handle the source as an exact decimal
type because everything is being parsed from string
.
In addition to the example @roll provided above, 3-layer parsing also creates ambiguity in situations like:
1)
data:
- ["id"]
- ["1"]
- ["2"]
- ["3"]
schema:
fields:
name: id
type: integer
2)
data:
- ["id"]
- [0]
- [true]
schema:
fields:
name: id
type: integer
3)
data:
- ["id"]
- ["1"]
- [0]
- [true]
- ["true"]
schema:
fields:
name: id
type: boolean
4)
data:
- ["id"]
- ["0"]
- ["1"]
- [0]
- [1]
schema:
fields:
name: id
type: boolean
trueValues: ["1"]
falseValues: [0]
If we have 2-layer parsing, that is, where all JSON native cell values are received by the TableSchema parser as string
types (ignoring the native JSON type info), the expected behavior is very straight-forward:
1) No validation errors, because the integer field type parses all the strings successfully.
2) One validation error, on boolean true
, because it is passed as string "true"
to the TableSchema parser (not as a native boolean
!).
3) No validation errors, because each string
cell value is in the default trueValues
or falseValues
arrays.
4) No validation errors, because the TableSchema only receives string "0"
and "1"
values. (edit: well actually technically a schema parse error if falseValues
must be string[]
)
(I understand that our current implementation may slightly differ right now because it currently conflates the two- and three-layer parsing approaches)
By contrast, 3-layer parsing creates a lot of questions:
1) Should we parse the native string
into a numeric
type here, or error because it is not a native numeric
type going into a numeric field type?
2) Can native boolean
be silently coerced to a numeric field type?
3) Can native number
be silently coerced to a boolean field type, even though default trueValues
and falseValues
are all strings
?
4) Do native types have to match the type specified in trueValues
and falseValues
? If so, then we have errors on cells string
"0"
and numeric
1
.
3-layer parsing also creates problems for schema-sharing:
data.csv:
myField
true
false
true
csvResource:
{
"name": "csvResource",
"format": "csv",
"path": "data.csv",
"schema": "schema.json"
}
jsonResource:
{
"name": "jsonResource",
"format": "json",
"data": [["a"], [true], [false], [true]],
"schema": "schema.json"
},
schema.json:
{
"fields": [
{
"name": "myField",
"type": "boolean",
"trueValues": ["true"],
"falseValues": ["false"]
}
]
}
With 2-layer parsing, this isn't a problem; the JSON and CSV files are interpreted exactly the same (as textual values). With 3-layer parsing, however, this may fail because the native values true
and false
are not listed in trueValues
and falseValues
. This is why I like 2-layer parsing, because when everything is parsed by TableSchema as string
(ignoring the native type info), the expected validation behavior is 1000x clearer.
...And this is just for JSON… we'd have to go through the same exercise for 3-layer parsing of SQL types, ODF types, etc and for those it'd be further complicated by a need to losslessly express their native values as JSON types. Much easier to to stick to the original 2-layer scope of frictionless being for textual tabular data, where by definition physical
values are always string
, and TableSchema fields define mappings from physical string
-> FieldType
(not NativeType
-> FieldType
)
I like the idea of 3-layer parsing, but I think to support native types properly in the spec, TableSchema would have to be rebuilt from the ground up with support for lossless representations of native values, or we'd need to create additional versions of TableSchema that to map the subtleties of different native values of a specific format to frictionless fields e.g. SQLSchema
, ODFSchema
... So I'm against it for V2. Instead, I think we'd be better off making the implementation's JSON behavior consistent with the original scope of 2-layer textual parsing (read each JSON array cell as a string, and ignore the native type info).
The spec is currently written / defined as (a) a 2-layer scheme. This is why missingValues is string[], and why trueValues/falseValues are string[]: everything should hit the TableSchema as a string physical value type, no matter its native origin type. I think this fits with TableSchema being billed as a description of textual data. (I realize the way the implementation currently handles JSON data is inconsistent with this; I'm referring to the broad intent of the spec in my reading here).
Note that it's not only about JSON; frictionless-py
supports a dozen formats and in-memory data. It never worked like this at least in Python and JavaScript, the parsers get an input cell and forward it as it is if it's not a string and process if it's a string. So, currently, these implementations based on the (1) model from above
I think it will be simple and correct to say that regarding the data model, Table Schema is no more than an extension of a native data format (all of them). This concept is quite simple, for example, we have JSON and there is SUPERJSON that adds support for date/time
, regexp
, etc. It's achieved via an additional layer of serialization and deserialization for lexical values. If we think about Table Schema that way than it's still the (1) data model and missing/false/true values need to stay strings only. But this model doesn't imply that all the input data need to be strings or it's only for textual data sources, not at all; it just means that Table Schema comes into play only when additional serialization/deserialization is needed.
PS. Thought a little bit more about it and I would say that on the Table Schema level, here is basically only two relevant concepts (while Data Resource and Table Dialect deal with physical representation):
That might be confusing though.
E.g. a JSON file with -1 denoting an empty value, we would say missingValues="-1". That's reasonable.
But what if 'n/a' is the empty value? Would we say missingValues="n/a" or "\"n/a\"" (as is the physical representation of the value)?
What if there is no natural string representation of the value (if the file format is not text based)?
On Thu, Feb 22, 2024, 10:49 roll @.***> wrote:
I think it will be simple and correct to say that regarding the data model, Table Schema is no more than an extension of a native data format (all of them). This concept is quite simple, for example, we have JSON and there is SUPERJSON https://github.com/blitz-js/superjson that adds support for date/time, regexp, etc. It's achieved via an additional layer of serialization and deserialization for lexical values. If we think about Table Schema that way than it's still the (1) data model and missing/false/true values need to stay strings only
— Reply to this email directly, view it on GitHub https://github.com/frictionlessdata/specs/issues/864#issuecomment-1958971407, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACAY5MOI4IZ2AUDBMQCE7TYU4BB5AVCNFSM6AAAAABBLUSTP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJYHE3TCNBQG4 . You are receiving this because you were mentioned.Message ID: @.***>
But what if 'n/a' is the empty value? Would we say missingValues="n/a" or "\"n/a\"" (as is the physical representation of the value)?
I'm getting to thinking that we actually need to isolate Table Schema from any physical data representation and let it operate only on the logical level. On the logical level it's n/a
no matter how it's stored
It's 3 layers but we only have to think about two levels:
Furthermore, with 3 layers we also need a way to losslessly represent native values in the Table Schema.
We should aim to be able to represent common data types in the type system of Table Schema but we don't have to ensure lossless mappings of native type systems. We define a set of data types such as string, number types, boolean, n/a... and either types of native format X directly map to one of these Table Schema types or implementations must downgrade their values, e.g. by serialization to string type values.
P.S: Maybe this table of common native scalar data types helps to find out what is needed (also for #867).
I bootstrapped a new specification called "Terminology" - https://datapackage.org/specifications/glossary/ - I think it will be great to define everything we need there and then refer it across the specs. Lately I encountered that e.g. physical
and logical
data are also needed to define descriptor serialization. And e.g. Tabular Data defined in the Table Schema spec is really needed in other places as well. Aslo, we often mention implementations, data publishers/producers, consumers etc so it will be good to define it
It's 3 layers but we only have to think about two levels:
I agree. It's always technically (at least) 3 layer, in that the source format needs to be parsed to get at the value cells. What I'm trying to get at is how we define the type signature of our field parsers.
Right now the spec defines field / schema parsers as mappings from string
-> FieldType
.
If we promote this to NativeType
-> FieldType
, then we introduce a lot of validation ambiguity in the form of type coercion rules in field definitions.
We define a set of data types such as string, number types, boolean, n/a... and either types of native format X directly map to one of these Table Schema types or implementations must downgrade their values, e.g. by serialization to string type values.
I think I agree. As a textual format, the TableSchema should be defined (as it currently is) in terms of always be parsing serialized string
values, no matter the source. In the special case where the native format directly maps, we can take a shortcut and directly import the data.
This way we keep missingValues: string[]
(match missing value cells on serialized value strings)
and can avoid missingValues: (string | NativeType)[]
, (match missing value cells on serialized strings or their NativeType, and have to think about type precedence / coercion rules)
I'm getting to thinking that we actually need to isolate Table Schema from any physical data representation and let it operate only on the logical level. On the logical level it's n/a no matter how it's stored
This is another good approach worth exploring. The challenge will be to keep it backwards compatible...
Dear all,
Here is a pull request based on @akariv's data model - https://github.com/frictionlessdata/datapackage/pull/49
I think this simple 3-layered model highly improves the quality of the building blocks on which Data Package stands and simplifies field types a lot conceptually. Initially, I was more in favour of thinking about Table Schmea as a string processor (serialize/deserialize) but having native
data representation makes things way easier and more consistent internally.
An interesting fact is that after the separation of the native representation sections for field types, we can realize that field types basically don't have any description on a logical level—something to improve in the future, I guess, as currently, we mostly define only serialization/deserialization rules.
Please take a look!
Great work @roll! I reviewed the PR and left a few minor comments.
Overview
This paragraph - https://datapackage.org/specifications/table-schema/#physical-and-logical-representation
I think
physical
term might be confusing (see #621) as it seems to be really meaninglexical
ortextual
whilelogical
sounds easy to understand in my opinion while it might still need to be brainstormedSubissues: