Slimefun / Slimefun4

Slimefun 4 - A unique Spigot/Paper plugin that looks and feels like a modpack. We've been giving you backpacks, jetpacks, reactors and much more since 2013.
GNU General Public License v3.0
960 stars 546 forks source link

Input nodes forget what they are after server restart. #621

Closed mrcoffee1026 closed 5 years ago

mrcoffee1026 commented 6 years ago

Description

I thought this was fixed like 20 builds ago, but it's back... where virtually every input node in use is a worthless skull. On rare occasion, placing a new Input node in the previous one's place will magically replace the items that were in the whitelist, but not always... it NEVER remembers what channel # was being used before, however... so there's a LOT of resetting up to do with every server restart... to the point that this part of the plugin is so unreliable and troublesome as to be nearly entirely unuseable. Please fix ASAP.

Server Log

There are never any errors in the log or any mention of the data just being dumped during startup/shutdown, but this is a pretty consistent occurrence, enough so that it should probably be saying SOMETHING about all the data it is just discarding as trash. ,

Evidence / Proof

If you've not witnessed this happen yet, then you don't use cargo.

Environment

CraftBukkit v1_12_R1 CS-CoreLib v1.5.16 Slimefun vDEV 4-22 ExoticGarden v1.6.3

AtomicScience commented 6 years ago

If you've not witnessed this happen yet, then you don't use cargo.

Well, I made a lot of different cargo systems and'd never occurred this problem, so please share with us your server logs and video evidence of this bug

mrcoffee1026 commented 6 years ago

A comment from someone who has used SlimeFun for longer than 3 seconds and so is well aware that it occasionally blanket replaces large crops of ExoticGarden tree fruits with "CSCoreLib" head blocks, has had to replace dozens of Androids and GPS nodes for the same reason... or anyone who read the original post at all to know that I'm already irritated that the plugin logs nothing about dumping this data, or has the intellect to realize that CSCoreLib skulls when right or left clicked do not provide any sort of useable information which will reveal its self in a video - would be greatly appreciated.

AtomicScience commented 6 years ago

Well, stock and well-configured Slimefun doesn't have this problem at all. Usually it happens because of errors while saving blocks, so your server log is essential.

P.S. Instead of replacing all Slimefun blocks you can load last backup from /data-storage/Slimefun/block-backups

comfuzzled commented 6 years ago

This happens on all servers.

mrcoffee1026 commented 6 years ago

@AtomicScience "stock and well-configured"... look, if there's a "don't randomly dump data" option in the config file for slimefun, I can't find it. Also your assertion that this does not happen contradicts your statement that "usually it happens because...". However, I can find no evidence in the log that either an error or data loss occurred. Your mentioning of the backup location - however... somewhat useful. As there seems to be a fully in tact text file available of every Input node on the server, I will just begin making my own manual copy before shutting down or starting up. I do not agree that the "when" would be during the routine saving of these files... as input nodes do not routinely glitch. Restarts - however - do seem to result in a tremendous quantity of node loss on random occasion. Again, I did think this problem had already been resolved in an earlier build. Its recurrence has caught me off guard. The very existence of the flat text files loaded with JSON is very likely the very root cause of the problem, it is extremely bizarre this information would not be stored in a more efficient manner (like mySQL). There could be additional open file limits and - pretty likely - size constraints for text files that would not be an issue for a database, besides the obvious reduction in overhead. In the mean time I will try to make my own copies... I could attempt backup usage but I'm not really sure at what point the loss occurs, and until something is logged it would be impossible to know which would be more data or less data than what survived already. Again, it is pretty clear that these things are happening either on start up or on shut down since everyone's nodes function right up to the point when I shut down. I would welcome anything in the log towards the end of locating the problem, there just hasn't been anything up to now. The next time it happens, I'll search again, but so far there has been nothing.

@comfuzzled while I appreciate the agreement, having someone chime in as if they've just learned to live with regularly scheduled data loss is extremely limited in its usefulness. Thanks anyway.

meiamsome commented 6 years ago

@mrcoffee1026

If you right click the skulls do they show the regular GUI? I've experienced that one (and breaking them dropped the right item). I've yet to have them turn into pure skeleton heads, though.

Could you check the folder plugins/SlimeFun/error-reports/ And attach any that occurred on/around restarts that this happened with. If you want you can attach a zip of the whole folder, but that might be quite large.

mrcoffee1026 commented 6 years ago

No, no GUI, no nothing. They still have the proper graphical skin, but they are nothing but CSCoreLib head blocks, there is nothing that can be done with them other than to break and replace them. We manually replaced everything the last time this happened, and I'm making backups of the data folder before doing restarts anymore. Oddly the error logs mention the input nodes and plants not at all. It does seem to regularly report each and every time it has replaced my GPS nodes with worthless skulls, but that's all it's letting me know about. I might later like to try experimenting with some running and moving androids to see if it reports when it replaces those, but I've just been in the habit of ensuring they're off and stationary before restarting to avoid the issue, I really have no idea how far back the last time that happened was.

Anyway, SOME of the data is kept as I can break the skull block and replace it with a node... if there were any items in the list inventory they will still be there when I reopen the new node. However... this is limited in its usefulness since the White/black list setting is lost, the channel #, as well as any addition options (lore, round robin), so usually needs to be studied and reconfigured after each replacement. It seems to me the bulk of the data being lost is the proper NBT data on the block itself. With that missing, GUI access is not an option.

meiamsome commented 6 years ago

@mrcoffee1026 Thanks for the thorough response. I'm sorry you're having this issue - I'm trying to get my head around this codebase, so I'll keep my eyes out for anything that might cause that.

mrcoffee1026 commented 6 years ago

very nice. Yeah I can't find any record of it having had happened at all, just after a restart we had a bunch of nodes to replace. I mean it happens a a LOT with the plant stuff but I just didn't expect this and it's really time consuming to need to do that very often... As I said our last server it was an ongoing issue with EVERY restart, it's at least not doing that - so far, anyway. RIP plants I'm pretty sure those are unsalvageable, but at this point I really wish there was a configurable tool to restore lost NBT data on blocks so I could just tell it what the blocks I'm about to click are and it would magically restore them... possibly region select with worldguard for the crazy fruit trees or something... or even a radium 5 setting so I can just make the oranges oranges again. Anyway thanks for looking into this... it hadn't happened to a bunch of nodes till really recently so I thought they'd sorted this out way back. Thanks again.

mrcoffee1026 commented 6 years ago

More information on this... the input nodes are losing data from THIS file only: data-storage\Slimefun\stored-blocks\world\CARGO_NODE_INPUT.sfb

My futile attempts to back up files before and after restarts are not helpful since the missing data is not being written to the file, therein the problem lies. There is no trace that the data ever existed at any point, ever. This includes all backups which are completely devoid of any trace of blocks that were placed in the locations where skulls are now. Clearly, these items do exist and did once as there is corresponding data in the stored-inventory folder. It is as though slimefun selectively - retroactively - deleted all traces of many blocks from one file while leaving traces elsewhere.

meiamsome commented 6 years ago

Hmm, are these nodes created recently before the restart? It may just be that a final save of the data files is not being correctly run in that case. I assume it isn't that, though because I wouldn't expect it to be so prevalent.

What happens if you use the debug fish on the nodes? ( /sf debug_fish)

mrcoffee1026 commented 6 years ago

Nothing right now... actually I'm not getting any information out of the debug fish at all on ANYTHING anymore, i think they broke it with the last release is my guess. And no, these were previously working nodes that worked on prior restarts and suddenly stopped being anywhere. I take that back about the fish i guess i don't know the perm node for it and was using it with OP before. Anyway i'll have to try another one when i find one.

mrcoffee1026 commented 6 years ago

Found another glitched one... remembered to OP, the fish does nothing on the block, just acts like it's not there (give me info on a nearby connector just fine though).

John000708 commented 6 years ago

Well it makes sense that the debug fish does nothing on the block, for some reason it cannot find any data on that block. Could you provide us with the node storage file before and after the restart?

mrcoffee1026 commented 6 years ago

That was kind of what I was trying to locate when searching through backups and what not... a real "before" and "after" that I could compare to each other, but i was not successful, it seems the damage was already done by the time I had any copies of my own. Next time this happens I'm hoping to have something like that.

Interesting... my constantly being replaced gps transmitters... are in a file named "null.sfb" no wonder every time I replaced them it would just happen again. However these were always corresponded with the crash reports pointed out, the input nodes I got nothing on, it's as if they were never placed. Anyway I"m going to keep making copies before doing restarts in case this is something I can restore without manually replacing every block. I had some more to replace the last day or two but it seems like just blocks we forgot to replace from when this last happened. I tried finding any trace of the blocks in copies I made and the older backups Slimefun made and there was nothing in any of them... which is kind of impossible since at one point they were all working nodes. I did notice when replacing one... it is saved to the file but in no particular order that I can decypher. I'd just replaced 3 nodes, they were not the last 3 lines of the file... the nodes aren't in order by coordinate location... so do not seem to be in any particular order at all, and adding new nodes seems to place new ones randomly within the file... perhaps data is being lost by whatever magical shuffling is going on each time the file is changed.

BlackBeltPanda commented 6 years ago

I reported this twice in the past but my reports were both closed without an explanation. I've seen several other closed reports describing the same issue. It's been going on for a long time now and it's very difficult to track down any errors or indications as to what's causing it. I'll continue to keep an eye open on my server, as well, in the hopes that I can find any useful information.

mrcoffee1026 commented 6 years ago

I actually haven't seen it happen again since the last time, but without some sort of error or crash report happening I'm not sure where to start... however the odd, seemingly random reordering of the data in the file is where I'd probably try to start if the entire issue can't be resolved by simply using a database instead of flat files for this information. But since the items aren't being added to the file in the order the blocks are being placed on the server... this suggest some "other" order is being used, and since that order is obviously not coordinate value... maybe whatever is reading them is serializing the file and then simply calling up the wrong line numbers (serial numbers) to make changes/saves... thereby overwriting some things and discarding others... which... is kind of what we experienced. A database solves this by having everything pre-serialized and giving every entry a fixed ID. I haven't dug into the scheduled saved routines, but... wouldn't simply using a database prevent there even being a need for these scheduled saves? I'm kind of thinking transitioning from one to the other should be a rather simple thing... and if we have any usable data left in these flat text files, getting those ported into a database should be simple as well since it's all JSON already. Anyway I hope someone with more know-hoe than me takes a further look into this soon, particularly with 1.13 coming up when a lot of the item ids are going to need a good bulk replace done to them anyway.

Momshroom commented 6 years ago

I'd just like to add that this happens on our server as well. Spigot 1.12, Slimefun 4.1.14, CS Corelib 1.5.15. Input nodes randomly become just properly textured CS Corelib heads.

It's not just input nodes, though: This morning a nuclear reactor became a block of obsidian. http://prntscr.com/j1siph Placed as a skull 88 hours ago. This morning it was obsidian.

meiamsome commented 6 years ago

@Momshroom This is an intended feature - your reactor either wasn't surrounded by water or you ran out of coolant cells in the reactor.

Momshroom commented 6 years ago

Thank you. Except it WAS surrounded by water: http://prntscr.com/j1swpl And core protect confirms the players assertion that he had not changed the water.

meiamsome commented 6 years ago

@Momshroom in which case it would be because it ran out of coolant cells that are also required to avoid the explosion

Momshroom commented 6 years ago

I can't verify what it had as it wasn't mine, but the player assures me he keeps it full and checks regularly and he's been a reliable and intelligent player. This is what his replacement reactor already looked like: http://prntscr.com/j1t1cm

This same player has needed about a dozen cargo nodes replaced over the last week as they turned into heads. He's not the only one with that issue.

I will, however, assume the reactor issue is unrelated to the node one unless it repeats. Thanks.

mrcoffee1026 commented 6 years ago

Hard to say without like constantly monitoring the coolant levels... but that is a normal result of running out of coolant rods. If he also had an access port in place and that device had glitched to a skull... then that might have been the origin of the coolant shortage... making for a the lava source / obsidian result.

I read right in the wiki this just being blamed on "data corruption", but I don't think it's alright to just leave it there. I do think this data should be more carefully handled, placed in a a proper database that is unlikely to cause enormous chunks of file to mysteriously vanish. It might require a person with a lot more code experience or at least experience with the internal workings of this plugin to really implement this properly, but I think this really needs to be done, sooner than later. For many reasons... besides that I'd just like to exponentially increase the size of my world come 1.13... and really need data writes to be able to keep up when I request a map fill or ExoticGardens is going to skull grief EVERYWHERE.

blackeyedgamer commented 6 years ago

On the server I am currently playing on we have all of these issues with heads turning in to skulls or CS corelib heads I honestly agree saving to mySQL instead of flat file would benifit this plugin in the longrun. even if someone else were to take on the torch it would be easier for them. I do not have the coding knowledge to make this happen nor enough dedication to make it happen. But if I was dedicated enough. I would learn and do it myself. With help of course.

Another very recent one also is energy regulators turning in to heads causing reactors to blow up and we even had items being deleted out of chests in chunks because of data corruption of the flat file.

VoidAngel commented 6 years ago

Yeah data corruption seems to happen all the time on production servers. I can confirm that restarts "cause" it in a way. Or more like the data in virtual memory never makes it to storage on shut down. Its seemly semi random, usually occurring to specific sf block at a time. Sometimes a large portion of androids will break, other times cargo input nodes will break. I should also note that most of the time the block will lose the normal block data but not the inventory data. It could be the result of an error while saving an individual block, as I believe atomicscience was trying to say, but would this really cause the rest of the pending saves for that sf item type to abort?

I also agree with changing how the data is saved, but the question is: Who here knows their way around different data storage techniques?

Also, is there possibly a reason for the data saves to be how they are now?

creator3 commented 6 years ago

It does seem to regularly report each and every time it has replaced my GPS nodes

Can you share that report with us, @mrcoffee1026 ?

it is saved to the file but in no particular order

The shuffling is because a HashMap is iterated over.

simply calling up the wrong line numbers (serial numbers) to make changes/saves.

From what I can tell, the block files are written entirely from scratch each time.

Maybe it's some sort of race condition? The auto save task is run in its own thread.

Here's a hypothetical example (I'm not saying this is what actually happens): 1) Something triggers a change to an input node's data 2) When the input node's data is being changed, it is temporarily cleared (I believe this part is actually true) 3) At the same time, the auto save task is running. When it saves CARGO_NODE_INPUT.sfb, the associated Config is missing an input node.

It looks like the "index" block info is set in matchesFilter, which is kind of a big thing for input nodes... That handles #1 and would explain why input nodes are uniquely affected

Map<String, Config> cache_blocks is cleared when saving, so the Config for CARGO_NODE_INPUT.sfb will be reloaded from disk with the input nodes missing. However, if the input nodes set their data again, they'll be added back. There's a lot going on, and every time I think I can rule this theory out, I realize something else needs to be considered.

mrcoffee1026 commented 6 years ago

https://pastebin.com/Y7gdTAJB

Like here's one about a GPS node but... this isn't the same thing happening here since later I discovered a NULL.sfb in my stored-blocks/world folder which listed the locations of the affected gps nodes and explained why every restart glitched them in that location... I don't know what happened the first time, but each residual time certainly did.

Comments got me thinking as did some interesting coincidences... the data backups being made virtually never have the missing data in them. Like often enough to be worthless for that purpose. I prefer an "always" here but... maybe "most of the time" is enough to be suspicious... maybe the backup procedure is actually when the data dump is happening. In any case getting some sort of error report to actually happen instead of "surprise, everything is broken again" on a restart is probably a good first step. Even if this were handled by a nice SQL database tomorrow... data loss could and likely would still happen if whatever method or routine that was handling data suddenly stops and drops everything without logging anything about it.

mrcoffee1026 commented 6 years ago

This probably doesn't help much since I now realize that error is only produced in the case that the data has already been moved to a file "null.sfb". Prior to it being placed in the file there might have been an exception that was thrown... at least in the case I could cause an a null file to be created, that was the case. But generally the magically disappearing input node information... I've never seen an exception in the logs nor a null.sfb created.

creator3 commented 6 years ago

Ping @TheBusyBiscuit

I think I may have solved this. Basically, other threads can try to load the CARGO_NODE_INPUT.sfb while the auto save task is still writing data to it. I've got a log that I think supports this. It shows a YAML error indicating the file was incomplete. Presumably, some attempts to load the partially-written data do not cause errors and instead successfully load a fraction of the input nodes.

I'm going to test saving to CARGO_NODE_INPUT.sfb.tmp and then renaming it to CARGO_NODE_INPUT.sfb on my server.

[21:38:32] [Craft Scheduler Thread - 1819/INFO]: [Slimefun] Saving Blocks for World "Survival1" (183411 Changes queued)
[21:38:32] [Server thread/ERROR]: Cannot load data-storage/Slimefun/stored-blocks/Survival1/CARGO_NODE_INPUT.sfb
org.bukkit.configuration.InvalidConfigurationException: while scanning a quoted scalar
 in 'string', line 110, column 27:
    Survival1;-1282;100;6878: '{"owner":"d14caa78-6697-4a89-83e9
                              ^
found unexpected end of stream
BlackBeltPanda commented 6 years ago

This also seems to be happening with Carbanado Edged Capacitors. I have one that breaks every server restart. The only error seems to be from an Energy Regulator that's attached to it. It throws this error each restart:

Slimefun Data:
  ID: ENERGY_REGULATOR
  Inventory: false
  Data: {"id":"ENERGY_REGULATOR"}

Stacktrace:

java.lang.NullPointerException
    at me.mrCookieSlime.Slimefun.api.energy.EnergyNet.tick(EnergyNet.java:146)
    at me.mrCookieSlime.Slimefun.Setup.SlimefunSetup$47.tick(SlimefunSetup.java:3272)
    at me.mrCookieSlime.Slimefun.api.TickerTask.run(TickerTask.java:200)
    at org.bukkit.craftbukkit.v1_12_R1.scheduler.CraftTask.run(CraftTask.java:63)
    at org.bukkit.craftbukkit.v1_12_R1.scheduler.CraftAsyncTask.run(CraftAsyncTask.java:52)
    at com.destroystokyo.paper.ServerSchedulerReportingWrapper.run(ServerSchedulerReportingWrapper.java:22)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
mrcoffee1026 commented 6 years ago

I personally just use the cargo_input_node.sfb file size as an indicator for when I should be restoring a backup I made previously. This happens to a handful of the files though... as mentioned in the post above, most of the errors get logged. The one you're describing likely gets a corresponding error report in this folder: Spigot/Spigot-Server/target/plugins/Slimefun/error-reports/ and you'll find that the culprit is the existence of - either that energy_regulator's block address or the capitor's address being in the file: Spigot/Spigot-Server/target/data-storage/Slimefun/stored-blocks/world/null.sfb. If you delete the null.sfb file, your issue will stop happening on restarts... at least until the next time the "initial event" occurs again which somehow reclassifies certain blocks as "null" instead of their appropriate type. While this does appear in the error report it happens so randomly I can't even begin to guess what the actual source of the issue truly is. This maybe related to the things like the truncation of data that often happens with the Input Node - and other files such as nearly every Exotic Garden plant bearing fruit - but is unlikely the SAME issue since it at least gets logged... and the locations are all kept track of in the "null.sfb" (although losing track of the original item name does make this information nearly worthless) file which cannot be said for the data dump that occurs in those other files.