Hello! My background is biology, so I'm a beginner in this field, kind of confused about putting the whole train dataset into one single dataframe var to load. Here I will first give a general introduction of what we want to do, then the two methods to stack the data based on my guess, please tell me what's the right way to do it. And please correct anything that I misunderstand.
In our projects, we have accumulated a long term records of many samples, and we hypothesize that several parameters can reflect the chance of 'event' of interest. And most likely, the parameters in 2 months before the 'event' are useful to indicate it, and ~20 days for a higher weight. But we don't know what's the best model to use fit these parameters in, so I think AutoML is the most suitable approach to find out it, right?
OK, now to the data, table 1 is an example to show how our data look like. I shorten the total time length and set the 'prediction period' to 5 days, so that the tables won't be toooo long, this applies to all the model tables in this Issue.
Table 1 Example of data from one sample
Date | parameter1 | parameter2 | … | parameter8 | EventDetection
-- | -- | -- | -- | -- | --
20221001 | xx | xx | xx | xx | no
20221002 | xx | xx | xx | xx | no
20221003 | xx | xx | xx | xx | no
20221004 | xx | xx | xx | xx | no
20221005 | xx | xx | xx | xx | no
20221006 | xx | xx | xx | xx | no
20221007 | xx | xx | xx | xx | yes
20221008 | xx | xx | xx | xx | no
20221009 | xx | xx | xx | xx | no
20221010 | xx | xx | xx | xx | no
20221011 | xx | xx | xx | xx | yes
20221012 | xx | xx | xx | xx | yes
20221013 | xx | xx | xx | xx | yes
20221014 | xx | xx | xx | xx | no
Hello! My background is biology, so I'm a beginner in this field, kind of confused about putting the whole train dataset into one single dataframe var to load. Here I will first give a general introduction of what we want to do, then the two methods to stack the data based on my guess, please tell me what's the right way to do it. And please correct anything that I misunderstand.
In our projects, we have accumulated a long term records of many samples, and we hypothesize that several parameters can reflect the chance of 'event' of interest. And most likely, the parameters in 2 months before the 'event' are useful to indicate it, and ~20 days for a higher weight. But we don't know what's the best model to use fit these parameters in, so I think AutoML is the most suitable approach to find out it, right? OK, now to the data, table 1 is an example to show how our data look like. I shorten the total time length and set the 'prediction period' to 5 days, so that the tables won't be toooo long, this applies to all the model tables in this Issue. Table 1 Example of data from one sample
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
Date | parameter1 | parameter2 | … | parameter8 | EventDetection -- | -- | -- | -- | -- | -- 20221001 | xx | xx | xx | xx | no 20221002 | xx | xx | xx | xx | no 20221003 | xx | xx | xx | xx | no 20221004 | xx | xx | xx | xx | no 20221005 | xx | xx | xx | xx | no 20221006 | xx | xx | xx | xx | no 20221007 | xx | xx | xx | xx | yes 20221008 | xx | xx | xx | xx | no 20221009 | xx | xx | xx | xx | no 20221010 | xx | xx | xx | xx | no 20221011 | xx | xx | xx | xx | yes 20221012 | xx | xx | xx | xx | yes 20221013 | xx | xx | xx | xx | yes 20221014 | xx | xx | xx | xx | no