Closed RogerTangos closed 5 years ago
Every row of the dataset is a unique entity to make predictions for. But this "entity" can in essence be a compound primary key of other logical entities.
For example, it is totally possible to have a dataset like
person_id | date | class |
---|---|---|
0 | 2018-01-01 | red |
0 | 2018-01-02 | blue |
so multiple predictions can be made for person 0.
Thank you @micahjsmith !
Am I right in thinking that if I wanted to predict a single event using a time-series, then I'd want to add all of my observations into a single row, as such?
person_id | date1 | data@date1 | date2 | data@date2 | ... | class |
---|---|---|---|---|---|---|
0 | 2018-01-01 | 0.0 | 2018-01-01 | 0.1 | ... | red |
And a followup:
Is there a way to exclude columns like person_id
from the model? Otherwise, I should probably remove the index columns from the example datasets.
Am I right in thinking that if I wanted to predict a single event using a time-series, then I'd want to add all of my observations into a single row, as such?
Yes
Is there a way to exclude columns like person_id from the model? Otherwise, I should probably remove the index columns from the example datasets.
No, currently. Data file is ultimately loaded via https://github.com/HDI-Project/ATM/blob/master/atm/model.py#L87. No custom options can be passed to read_csv
, though it would be a nice feature to allow arbitrary read_csv
kwargs in run.yaml
config file.
Hey @micahjsmith , etc, I apologize if this is a bit vague. I figured it'd be better to ask about this in a public forum so that it was well documented.
Is ATM able to handle multiple rows of the same entity? Or do samples need to be flattened into a single row?
As an example use-case, a timeseries dataset might have single entities with multiple observations.
If ATM can handle this, it seems like the
entity_id
would be contained in an unnamed index column, as shown in the pitchfork_genres.csv example dataset. However, none of the example datasets have multiple rows of the same entity.