AutoViML / Auto_TS

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Created by Ram Seshadri. Collaborators welcome.
Apache License 2.0
723 stars 113 forks source link

Error: 'could not convert string to float' on datetime column #103

Closed emobs closed 10 months ago

emobs commented 1 year ago

This error is thrown when fit is initiated: could not convert string to float on the Time columns of my data. This column is of the datetime data type and there are no missing or incorrect values in it. What can be done to fix this?

AutoViML commented 1 year ago

Can you just change the Time column to pandas date-time dtype and then do the fit?That might avoid this error.Auto Vimal On Tuesday, September 5, 2023 at 08:01:54 AM EDT, EMOBS @.***> wrote:

This error is thrown when fit is initiated: could not convert string to float on the Time columns of my data. This column is of the datetime data type and there are no missing or incorrect values in it. What can be done to fix this?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

emobs commented 1 year ago

Thank you for the quick reply!

I did that this way already: data['Time'] = pd.to_datetime(data['Time'], format='%Y.%m.%d %H:%M:%S') Then I checked the data type of the 'Time' column after conversion which was of the datetime64[ns] type then. However, the issue persists.

AutoViML commented 1 year ago

Ok then I would need you to do the following:Post a snippet of your data (100 rows) if possible in a zip file and attach it to this reply.Then post a code snippet of how you are calling Auto_TS. I can then find out what the problem is. Thanks for trying out Auto_TSAuto Vimal On Tuesday, September 5, 2023 at 08:24:09 AM EDT, EMOBS @.***> wrote:

Thank you for the quick reply!

I did that this way already: data['Time'] = pd.to_datetime(data['Time'], format='%Y.%m.%d %H:%M:%S') Then I checked the data type of the 'Time' column after conversion which was of the datetime64[ns] type then. However, the issue persists.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

emobs commented 1 year ago

Sure thanks, here's the dummy data.zip. And this is the python code initiating the model fit:

                    model = auto_timeseries(
                        score_type='rmse',
                        time_interval=Timeframe_frequency,  
                        model_type='best', 
                        verbose=2,
                        forecast_period=2,
                        non_seasonal_pdq=None, 
                        seasonality=True
                    )

                    # Convert the 'Time' column to the desired string format
                    data['Time'] = pd.to_datetime(data['Time'], format='%Y.%m.%d %H:%M:%S')

                    model.fit(
                        traindata=data[:-2],  # Excluding the last 2 rows for training
                        ts_column='Time',
                        target=target_col
                    )

Thanks for your support!

emobs commented 1 year ago

By the way, the csv file is read using data = pd.read_csv(file_path, encoding='utf-16', delimiter=';') and stored into a pandas data frame and then passed as an argument (data) to the function that creates the model and initiates the fit as in the code above.

emobs commented 1 year ago

Any news on this issue yet? We're you able to reproduce the error and/or pinpoint the cause? If you need more details, please let me know.. Thanks.

AutoViML commented 1 year ago

Hey I am not able to read your input_python.csv file - I tried every variation. I think this the problem.Can you fix it?datapath = '../../downloads/data/'filename = 'python_input.csv' dft = pd.read_csv(datapath+filename, encoding='utf-16', delimiter=';')print(dft.shape) dft.head(1)

On Tuesday, September 5, 2023 at 05:04:16 PM EDT, EMOBS ***@***.***> wrote:  

Any news on this issue yet? We're you able to reproduce the error and/or pinpoint the cause? If you need more details, please let me know.. Thanks.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

emobs commented 1 year ago

I just tried: Downloaded it myself from this topic, unzipped and opened the file without problems. Shall I send you a copy by email?

emobs commented 1 year ago

Hello, I sent you 2 emails regarding this issue, but not sure if you read or even received those. Please let me know, thanks in advance.

AutoViML commented 1 year ago

Hello: I gave you the code snippet I used to read your zip file and your file did not come out correctly. Let me put it here again for you to try it and report back: See my screenshot of how badly the file comes out when you use this code.


dft = pd.read_csv(datapath+filename, encoding='utf-16', delimiter=';')print(dft.shape)
dft.head(1)```

    On Monday, September 11, 2023 at 05:15:58 AM EDT, EMOBS ***@***.***> wrote:  

Hello, I sent you 2 emails regarding this issue, but not sure if you read or even received those. Please let me know, thanks in advance.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
emobs commented 1 year ago

Hi and thanks for your reply!

Using this code:

file_path = os.path.join(data_path, 'python_input.csv')
data = pd.read_csv(file_path, encoding='utf-16', delimiter=';')
logging.info(f"Data shape: {data.shape}")
logging.info(f"Data sample:\n{data.head()}")  

I get this as the data shape and head after reading the file:

Data shape: (500, 21) Data sample: Time Data1 Data2 ... Data3 Data4 Data5 2023.09.07 22:10:00 1.06958 1.06947 ... 42.835099 18.564245 1.071799 2023.09.07 22:15:00 1.06948 1.06949 ... 35.744064 18.600451 1.071745 2023.09.07 22:20:00 1.06948 1.06953 ... 42.293215 20.935565 1.071687 2023.09.07 22:25:00 1.06954 1.06958 ... 41.152948 23.629927 1.071650 2023.09.07 22:30:00 1.06958 1.06954 ... 43.177273 22.100989 1.071612

Looks good to me. What do you think?

AutoViML commented 1 year ago

It might be a difference in pandas versions. I am getting a badly formed dataframe when I use the CSV file above using read_csv. See Screenshot below

On Monday, September 11, 2023 at 08:58:01 AM EDT, EMOBS ***@***.***> wrote:  

Hi and thanks for your reply!

Using this code: file_path = os.path.join(data_path, 'python_input.csv') data = pd.read_csv(file_path, encoding='utf-16', delimiter=';') logging.info(f"Data shape: {data.shape}") logging.info(f"Data sample:\n{data.head()}")

I get this as the data shape and head after reading the file:

Data shape: (500, 21) Data sample: Time Data1 Data2 ... Data3 Data4 Data5 2023.09.07 22:10:00 1.06958 1.06947 ... 42.835099 18.564245 1.071799 2023.09.07 22:15:00 1.06948 1.06949 ... 35.744064 18.600451 1.071745 2023.09.07 22:20:00 1.06948 1.06953 ... 42.293215 20.935565 1.071687 2023.09.07 22:25:00 1.06954 1.06958 ... 41.152948 23.629927 1.071650 2023.09.07 22:30:00 1.06958 1.06954 ... 43.177273 22.100989 1.071612

Looks good to me. What do you think?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>