frli4797 / influxv2tovm

Apache License 2.0
0 stars 1 forks source link

TypeError: string indices must be integers, not 'str' #1

Closed Schtallone closed 4 days ago

Schtallone commented 4 weeks ago

Hello,

trying to migrate influxdb 2.x data to VM, I got the following error messages: `

Dry run True Pivot False Finding unique time series. Traceback (most recent call last): File "/homeassistant/pyscript/influxv2tovm.py", line 343, in <module> main(vars(parser.parse_args())) File "/homeassistant/pyscript/influxv2tovm.py", line 268, in main migrator.migrate() File "/homeassistant/pyscript/influxv2tovm.py", line 95, in migrate measurements_and_fields = self.__find_all_measurements() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/homeassistant/pyscript/influxv2tovm.py", line 188, in __find_all_measurements measurements_and_fields.update(df[self.__measurement_key].unique()) ~~^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: string indices must be integers, not 'str' Exception ignored in: <function InfluxMigrator.__del__ at 0x7f38143d1580> Traceback (most recent call last): File "/homeassistant/pyscript/influxv2tovm.py", line 78, in __del__ self.__progress_file.close() ^^^^^^^^^^^^^^^^^^^^ AttributeError: 'InfluxMigrator' object has no attribute '_InfluxMigrator__progress_file

I tried to migrate a special bucket with filtered Data (time frame) and dropped tag values.

When I try to migrate an acual active filled bucket without any filtering or shaping, the dry run works and I get the following end-notice:

Exception ignored in: <function InfluxMigrator.__del__ at 0x7f9c8b5f9580>ading. Total: 524.3 kB (1/1) Traceback (most recent call last): File "/homeassistant/pyscript/influxv2tovm.py", line 78, in __del__ self.__progress_file.close() ^^^^^^^^^^^^^^^^^^^^ AttributeError: 'InfluxMigrator' object has no attribute '_InfluxMigrator__progress_file' All done

Thanks for any help on this

Greets

frli4797 commented 3 weeks ago

Hey @Schtallone!

For context, which I'm sure that you understand, but anyways. This comes with no guarantees whatsoever. It seems like it's very hard to foresee what data is going to look like in InfluxDB2 and thus really hard to test it.

This was developed by me, by being inspired (a lot from other projects) in an effort to migrate me own Home assistant InfluxDB to VictoriaMetrics, and also pivot the data to be keyed on entity_idrather than the obvious _measurement.

I'm going a bit blind here since my original Influx DBs are long gone, so I have no means of testing this. But anyways.

It seems like two things are going on here.

  1. The progress file that I historically used as means to re-run malformed data is no longer part of the code. Seems like the attempt to close that file (which no longer exists) is in vain. This is fixed.
  2. Seems like doing this vanilla, ie converting _measurements to their equivalent in VM was broken. (Only tested with Hassio data.) This could be fixed now.

Tl;dr Check out branch d1 to see if that works for you. I'll hold off merging to main until you've tested. If this doesn't work I would be out of options to help you. But lets try this and see if it works for you. Please comment here with any results.

Schtallone commented 3 weeks ago

Hello and thank your for your reply.

  1. The progress file that I historically used as means to re-run malformed data is no longer part of the code. Seems like the attempt to close that file (which no longer exists) is in vain. This is fixed.

With this you mean a progress-file you used (to not loose data) which is not anymore in the code?

2. Seems like doing this vanilla, ie converting _measurements to their equivalent in VM was broken. (Only tested with Hassio data.) This **could** be fixed now.

What do you mean with this?

Tl;dr Check out branch d1 to see if that works for you. I'll hold off merging to main until you've tested. If this doesn't work I would be out of options to help you. But lets try this and see if it works for you. Please comment here with any results.

Is this a different, updated code of the script? Then I will just copy the code and try again.

For more understanding, I have metrics written from ioBroker via NodeRed to the influxDB. The data, on which the dryRun works, look like the following (out from influxDB2 frontend).

`

tablemean _measurementgroupstring _fieldgroupstring _valueno groupdouble _startgroupdateTime:RFC3339 _stopgroupdateTime:RFC3339 _timeno groupdateTime:RFC3339 aggregategroupstring devicegroupstring origingroupstring phasegroupstring rolegroupstring rollup_intervalgroupstring sensorgroupstring
0 meterReading value 233.86 2024-08-15T17:32:06.427Z 2024-08-22T17:32:06.427Z 2024-08-15T22:30:00.000Z last Backofen system.adapter.sourceanalytix.0 Phase1 Electric 1d Consumption

`

The shaped data with dropped tag, which wont work with the dryRun looks like this:

`

tablemean _measurementgroupstring _fieldgroupstring _valueno groupdouble _startgroupdateTime:RFC3339 _stopgroupdateTime:RFC3339 _timeno groupdateTime:RFC3339
0 meterReading value 3838.54 2024-03-03T18:35:02.000Z 2024-08-22T17:35:02.045Z 2024-03-07T01:27:10.000Z
0 meterReading value 3885.7 2024-03-03T18:35:02.000Z 2024-08-22T17:35:02.045Z 2024-03-10T09:42:00.000Z

`

Perhaps you can sort something out, but for me it looks identically, but with less data.

Thanks!

frli4797 commented 3 weeks ago

What do you mean with this?

Just assuming that you data is structured in a way that is not specific to Home Assistant. Which may be the case looking at your examples.

Is this a different, updated code of the script?

Yes. I made some updates based on my assumptions on what I think is going on. Whilst also fixing a quite obvious defect in my code. This could fix your problem.

The above might not work though, as the data in the examples you provided seems to be somewhat weirdly formatted. Usually you see the measurement in Influx as _measurement and fields like _field, but in your case it seems like its _measurementgroupstring and _fieldgroupstring, which looks rather odd. But then again, I'm hardly an expert on InfluxDB.

I guess it comes down to trial and terror.

Schtallone commented 3 weeks ago

Hello,

sorry for the confusion. My data is formatted right I think. In my first sample, it come due to copy and paste.

Here is a screenshot of one of my prepared metrics without all the tags, but which wont work until now. Sorry but I hadnt the time to test the updated code. Will do shortly.

grafik

Schtallone commented 3 weeks ago

I tested the updated code:

Here is what I got from a (formerly also) working bucket:

meterReading,aggregate=last,device=Wohnzimmer_West,origin=system.adapter.sourceanalytix.0,phase=Phase3,role=Shutter,rollup_interval=1w,sensor=Consumption value=0.339393 1719698400000000000 meterReading,aggregate=last,device=Wohnzimmer_West,origin=system.adapter.sourceanalytix.0,phase=Phase3,role=Shutter,rollup_interval=1w,sensor=Consumption value=0.345618 1720044000000000000 meterReading,aggregate=last,device=Wohnzimmer_West,origin=system.adapter.sourceanalytix.0,phase=Phase3,role=Shutter,rollup_interval=1w,sensor=Consumption value=0.345618 1720303200000000000 meterReading,aggregate=last,device=Wohnzimmer_West,origin=system.adapter.sourceanalytix.0,phase=Phase3,role=Shutter,rollup_interval=1w,sensor=Consumption value=0.345618 1720648800000000000 meterReading,aggregate=last,device=Wohnzimmer_West,origin=system.adapter.sourceanalytix.0,phase=Phase3,role=Shutter,rollup_interval=1w,sensor=Consumption value=0.371817 1722463200000000000 meterReading,aggregate=last,device=Wohnzimmer_West,origin=system.adapter.sourceanalytix.0,phase=Phase3,role=Shutter,rollup_interval=1w,sensor=Consumption value=0.375863 1722722400000000000 meterReading,aggregate=last,device=Wohnzimmer_West,origin=system.adapter.sourceanalytix.0,phase=Phase3,role=Shutter,rollup_interval=1w,sensor=Consumption value=0.38205700000000004 1723068000000000000 meterReading,aggregate=last,device=Wohnzimmer_West,origin=system.adapter.sourceanalytix.0,phase=Phase3,role=Shutter,rollup_interval=1w,sensor=Consumption value=0.38205700000000004 1723327200000000000 All done9 lines bytes to VictoriaMetrics db=meterReadings_1w for meterReading. Total: 153.6 kB (1/1)

And here is still the "shaped" bucket with this error:

Dry run True Pivot False Finding unique time series. Traceback (most recent call last): File "/homeassistant/pyscript/influxv2tovm.py", line 344, in <module> main(vars(parser.parse_args())) File "/homeassistant/pyscript/influxv2tovm.py", line 269, in main migrator.migrate() File "/homeassistant/pyscript/influxv2tovm.py", line 96, in migrate measurements_and_fields = self.__find_all_measurements() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/homeassistant/pyscript/influxv2tovm.py", line 189, in __find_all_measurements measurements_and_fields.update(df[self.__measurement_key].unique()) ~~^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: string indices must be integers, not 'str'

funny is that the working bucket are "downsizing-task" buckets, which all work. But when I create the shaped metrics out of this downsizing-buckets also via a task, I got the error!

frli4797 commented 6 days ago

It would be interesting to debug this using your data, just to see how it's formed and how my code behaves when trying to find all the measurements in the dataset. Any chance that you can create a small example file (like a influxdb backup package) that also causes this behavior? Trying to debug this becomes pure guesswork without data that produces this bug.

Also, merged some additional bugfixes to main just recently, as @maxlyth found a few problems with escaping characters correctly. Might be worth for you to tests the newest version,, even though I don't believe that it would solve your problem tbh.

frli4797 commented 6 days ago

I did some additional thinking and just guessing from experience I've learned that the influxdb client sometimes returns a list of DataFrames and sometimes just a single DataFrame. Hence I added some code to determine this at the point of failure. Please @Schtallone check out the d1 branch and try with your data. Again. I'm just guessing at this point, but this might be worth a try.

Schtallone commented 4 days ago

I did some additional thinking and just guessing from experience I've learned that the influxdb client sometimes returns a list of DataFrames and sometimes just a single DataFrame. Hence I added some code to determine this at the point of failure. Please @Schtallone check out the d1 branch and try with your data. Again. I'm just guessing at this point, but this might be worth a try.

Hello,

now with the new d1 branch, it seems to work so far. With the dry run, i get this result:

Dry run True Pivot False Finding unique time series. Found 1 unique time series All done

Is there a debug file in which I can see the output? Because my next task is then to "rename" the data that it fits the HA naming.

Thanks!

frli4797 commented 4 days ago

There used to be an insane amount of debug logging in this code, but I removed the most of it as it became very difficult to interpret it due to massive amounts of data.

It should be fairly simple to add some logging back to this code. Potentially one could add code to output every line written to VM, or/and also just debugging every metrics being written.

Line 200 might be a good point to log every metric that is about to be migrated. And 127 or there around to output every line being written.

You are also most welcome to contribute with a PR to that effect. :)

frli4797 commented 4 days ago

Closed by #6.

Deficiencies with debug logging to be managed separately, by either opening new issue or submitting a PR.