eyeonus / Trade-Dangerous

Mozilla Public License 2.0
96 stars 31 forks source link

The server seems to be no longer updating. #177

Closed Tromador closed 1 month ago

Tromador commented 1 month ago
          The server seems to be no longer updating.

Last updates shown are 21st and 22nd May.

Originally posted by @rmb4253 in https://github.com/eyeonus/Trade-Dangerous/issues/148#issuecomment-2129859192

Tromador commented 1 month ago

I have 3 days worth of logfiles containing millions (literally, several million) lines from a spansh update:

        |  @MANHARI/T9Q-41N                                    |  Skipping station due to age: 322 days, 5:03:33.920899, ts: 2023-07-07 21:49:16
        |  @MANHARI/V8J-6KJ                                    |  Skipping station due to age: 438 days, 18:13:17.920899, ts: 2023-03-13 08:39:32
        |  @MANHARI/RZH-18N                                    |  Skipping station due to age: 825 days, 3:55:42.920899, ts: 2022-02-19 22:57:07
        |  @MANHARI/K4Y-35N                                    |  Skipping station due to age: 98 days, 8:19:37.920899, ts: 2024-02-16 18:33:12
        |  Sarmiento de Gamboa City                            |  Updated station
        |  @MANHARI/Hill Vision                                |  Skipping station due to age: 33 days, 15:10:53.920899, ts: 2024-04-21 11:41:56

And no clue why, really. I do notice in the above example carrier information from some years ago. They have likely moved on, but also tons of fixed stations, I suspect a majority are likely Odyssey bases.

It hasn't taken this long before. Has spansh data suddenly grown by some large factor, or has spansh plugin gone insane? I don't think it's listener, as I saw it call TD normally, but doesn't seem to have finished. It's possible it had finished reading the spansh data, as it seemes to be just sitting there, in a kind of processing please wait manner

Notice the station that did get updated? It has no system name logged. That could be a separate bug in the logging code of course, but as far as I could see checking some of the files all the updated stations are like that.

Finally note that log size (compressed) increased by two orders of magnitude. Uncompressed from 10s of megabytes, to just short of 2gig.

I am kinda loathe to restart until I get some feedback.

Tromador commented 1 month ago

Sorry if this generated a load of notifications. I did this on the tablet and the cat got involved...

eyeonus commented 1 month ago

So, the listener detects a Spansh update and calls TD, TD performs the spansh import and just sits there, it never exits?

I don't know what would be causing that, but if it's literally doing nothing, kill TD and let the listener get control back.

Tromador commented 1 month ago

Well no. It didn't just sit there. I had 3 whole days worth of logs of spansh prior to it "sitting there", at which time it's entirely possible it was trying to deal with all that data. I will restart and keep an eye on it. I have archived the logs in case we need them, but at several GB of data, t'will be a pain to send them anywhere.

eyeonus commented 1 month ago

Indeed. I just looked, and the Spansh source file 9.0GB, which is approximately the same as when we first got the Spansh plugin, so it definitely isn't a sudden huge growth of data.

eyeonus commented 1 month ago

It definitely shouldn't be taking 3 days to process the file.

Tromador commented 1 month ago

It definitely shouldn't be taking 3 days to process the file.

Good. So it definitely did go mad for some reason.

Did you note this in my post above: "Notice the station that did get updated? It has no system name logged. That could be a separate bug in the logging code of course, but as far as I could see checking some of the files all the updated stations are like that."

eyeonus commented 1 month ago

It's not a bug per se, that particular message just doesn't include the system:

        note = "Updated" if self.known_stations.get(station.id) else "Added"
        if self.tdenv.detail > 1:
            self.print(f'        |  {station.name:50s}  |  {note} station')
Tromador commented 1 month ago

Maybe it should so it's consistent?

On Tue, 28 May 2024 at 03:15, Jonathan Jones @.***> wrote:

It's not a bug per se, that particular message just doesn't include the system:

    note = "Updated" if self.known_stations.get(station.id) else "Added"
    if self.tdenv.detail > 1:
        self.print(f'        |  {station.name:50s}  |  {note} station')

— Reply to this email directly, view it on GitHub https://github.com/eyeonus/Trade-Dangerous/issues/177#issuecomment-2134238905, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJJGYLFTDRYXGVCVQYUIQ4LZEPSDDAVCNFSM6AAAAABIJEDJ7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZUGIZTQOJQGU . You are receiving this because you modified the open/close state.Message ID: @.***>

-- Omnia dicta fortiora si dicta Latina!

Tromador commented 1 month ago

Turns out the root partition was filling up (this is bad, mmkay) and causing TD to restart its import over and over ad infinitum.

Presently actioning a plan to move /var to its own partition to clear the problem (and make more space for it to grow further if required) which will fix the problem. I'll do something similar on the new host as well to make sure we don't repeat this. Root partition to be monitored for the next while.

Server will be back up later today after I've synced /var into the partition and got the host machine running nicely.

Tromador commented 1 month ago

Might end up being tomorrow. Major problem at the hosting company and sorting stuff for ex-employees free stuff is very low priority atm.