dragoshenron / solaredge-webscrape

Bash script to automatically download power optimizer data from Solaredge web portal.
Apache License 2.0
8 stars 1 forks source link

HTTP_CODE 500 Error #1

Closed inckagan closed 2 years ago

inckagan commented 3 years ago

Hi, I just tried for my site for a single optimizer here is the error message I am having:

[root@ns1 solaredge_scrape]# sh solaredge_webscrape_influxdb.sh usage: solaredge_webscrape_influxdb.sh [date] date date to download in format "YYYY-MM-DD" if no date is specified, it will download the 24h data starting yesterday midnight


Downloading 119A3759-3B(Current) for Mon Nov 8 00:00:00 +03 2021: HTTP_CODE:500solaredge_webscrape_influxdb.sh: line 92: jq: command not found -> jq skipped (JSON empty) Downloading 119A3759-3B(Energy) for Mon Nov 8 00:00:00 +03 2021: HTTP_CODE:500solaredge_webscrape_influxdb.sh: line 92: jq: command not found -> jq skipped (JSON empty) Downloading 119A3759-3B(Voltage) for Mon Nov 8 00:00:00 +03 2021: HTTP_CODE:500solaredge_webscrape_influxdb.sh: line 92: jq: command not found -> jq skipped (JSON empty) Downloading 119A3759-3B(PowerBox%20Voltage) for Mon Nov 8 00:00:00 +03 2021: HTTP_CODE:500solaredge_webscrape_influxdb.sh: line 92: jq: command not found -> jq skipped (JSON empty) Downloading 119A3759-3B(Power) for Mon Nov 8 00:00:00 +03 2021: HTTP_CODE:500solaredge_webscrape_influxdb.sh: line 92: jq: command not found -> jq skipped (JSON empty) Line protocol file does not exist (all JSON response empty?). Nothing to do. Exiting. (/tmp/SEscrape2influx/211109123619/211108_solaredge_webscrape.lp)

dragoshenron commented 3 years ago

Screenshot_2021-11-09_at_10_46_05 The requesterId is something else. See attachment as guideline on where to find it. It a simple number of 8 digits

There is no relation with the reCaptcha.

PS: Do not run the script using root

inckagan commented 3 years ago

Thanks, I cannot locate requesterId on source. Any other idea?

dragoshenron commented 3 years ago

I think you need an installer account, not a simple user account. This can be requested to Solaredge directly. A LONG while ago, I did it via email. I don't if they changed procedure in the meantime.

inckagan commented 3 years ago

Thanks again, what should be the given date if I want to retrieve only the last data? Also is it possible to retrieve data at 5mins interval?

dragoshenron commented 3 years ago

Thanks again, what should be the given date if I want to retrieve only the last data?

Uhm... good question. However, upload from the inverter is asyncronous and so it's not really working the idea of polling "the last datapoint". In other words, if you want real-time assessment you have to use the script of jbuehl.

Also is it possible to retrieve data at 5mins interval?

No, solaredge doesn't store any sub 15-min value. As said https://github.com/jbuehl/solaredge/issues/164 if you want to have sub 15-min interval, you have to use something else

inckagan commented 3 years ago

If we stick to 15 mins interval, can we get the last datapoint in the last 15 mins or so? What should then be the startDate endDate parameters?

dragoshenron commented 3 years ago

For us it is not possible to physically be in the field now so that's why we are looking an alternative solution.

If we stick to 15 mins interval, can we get the last datapoint in the last 15 mins or so? What should then be the startDate endDate parameters?

I'll have a look on the timestamp format to perform a query like you mentioned

dragoshenron commented 3 years ago

Without any full debug, you have to adapt the following lines

startDate=$(($(date -d "$beginning" '+%s')*1000))
endDate=$(($(date -d "$beginning + 1day" "+%s")*1000))
archiveDate=$(date -d "$beginning" "+%y%m%d")

to

startDate=$(($(date "+%s")*1000)) #now
endDate=$(($(date "+%s")*1000-900000)) #now - 15minutes
archiveDate=$(date -d "$beginning" "+%y%m%d-%H:%M:%S)")

archiveDate is only for local storage. startdate and endDate is the interval actually queried, in epoch format (milliseconds)

Once again, I'm afraid that even if the query is formally correct, you may end up with a result different than what you'd expect.

inckagan commented 3 years ago

Thanks a lot. When we load data from yesterday 00:00 till now, can we interpret which time index stand for what time exactly?

It is okay to get them in 15 mins but I need to be sure which data belongs to what time frame (at least in 15 mins resolution)?

dragoshenron commented 3 years ago

The values are returned as {date, value} pairs. The first element is the timestamp in epoch format, for example: {"dateValuePairs":[{"date":1630566000000,"value":3.8645834922790527},{"date":1630566900000,"value":4.965169429779053},{"date":1630567800000,"value":5.408463478088379},{"date":1630568700000,"value":5.353841304779053},{"date":1630569600000,"value":6.074023723602295},{"date":1630570500000,"value":4.8768229484558105},{"date":1630571400000,"value":4.866796970367432},{"date":1630572300000,"value":2.258593797683716}],"intervalTime":15,"name":"Str1.0 P","reporterName":"String 1.0","timeUnitId":2,"unit":"W"}

so you are sure about the timestamp associated with each value.

inckagan commented 3 years ago

Ok, now I see that. When I run it from now to now-15mins I get an empty array and I see that for my site the latest data recorded belongs to yesterday night. So it seems this is not updated even in 15 mins. Is this what you were telling?

Have you ever looked into other web scraping options such as selenium?

Thanks

dragoshenron commented 3 years ago

Ok, now I see that. When I run it from now to now-15mins I get an empty array and I see that for my site the latest data recorded belongs to yesterday night. So it seems this is not updated even in 15 mins. Is this what you were telling?

Yes, indeed. The upload frequency of the inverter (or at least the moment that the SE server renders the data available) is not consistent.

Have you ever looked into other web scraping options such as selenium?

Yes, once. I didn't see any benefit and I saw many unneeded complications. Hence, I stop to look into selenium and other headless browsers.

Thanks

Thanks to you

inckagan commented 3 years ago

I figured out that there is huge difference in current values for similar timestamps.

endDate in ms : 1636552877000 startDate in ms: 1636551390000

Any idea?

dragoshenron commented 3 years ago

Yes. Time zone :) Epoch is in UTC (forget the leap second bazar) while the webinterface account for the time zone difference, depending on where your plant is located. Hint: forget everything that is local time. It's a bottomless pit of problems. Stick with UTC.

inckagan commented 3 years ago

Iit seems data is not synchronized always. What was the issue you saw?

dragoshenron commented 3 years ago

From my experience, not always the data are showed on regular basis. Therefore, the last available data was sometimes of hours before. Then all the data were present, maybe uploaded in batch. But maybe in the meantime SE changed inverter firmware and/or server routines to be more consistent.

Feel free to perform some tests and contribute to the repo with the option to get "last available datapoint" :)

inckagan commented 3 years ago

Sure, I will try.

Do you also manage to get inverter AC power and inverter AC Voltage as well?

dragoshenron commented 3 years ago

That's available from documented APIs: https://www.solaredge.com/sites/default/files/se_monitoring_api.pdf (pag 35) It should be possible to get those values also from the approach used in this script but I never tried

inckagan commented 3 years ago

Would be great to get it done by that script, could you let me know if you accomplish that.

Thanks a lot

dragoshenron commented 3 years ago

Nope :) I'm not interested in AC side of the story. I did this work in my spare time as you imagine. If you are very serious about it and you need it (for commercial of personal purposes) than we can agree separately for an extra development step, fit exactly on your needs and specs. You can reach me at info (at) zippe (dot) it

I hope you understand :)