DragonAPI causes Mekanism laser beams to not render at certain angles in 1.7.10

jonnyawsom3 commented 5 years ago

using the DragonAPI and Mekanism in minecraft 1.7.10 causes the mekanism laser beams to stop rendering at certain angles, including when looking directly towards them (in/near the path of the beam) happens on DragonAPI 1.7.10 V22d and Mekanism-1.7.10-9.1.1.1031 (and Mekanism-1.7.10-9.1.0.281) (it may happen on other versions of the mods, but because other versions aren't available from the mod pages, there's no way for me to test) and i'm using forge-10.13.4.1614

Barhandar commented 5 years ago

There is a very large amount of versions of all of Reika's mods for 1.7.10, they're just hidden so only latest is available. State the versions.

jonnyawsom3 commented 5 years ago

There is a very large amount of versions of all of Reika's mods for 1.7.10, they're just hidden so only latest is available. State the versions.

there, i fixed it, took me a while to remember that modpacks could have old versions of the mods... for a while i thought you were contradicting yourself, by saying i can only get the latest version, so therefore i need to say what version i have...

ReikaKalseki commented 5 years ago

I fail to see how I could have this effect.

jonnyawsom3 commented 5 years ago

Now I've confused myself, my past message was about me editing the issue description. Those i's weren't directed at you reika.

Or if you meant that you don't see how your mod made the mekanism lasers stop rendering at certain angles. All you need to do is go into a creative world with DragonAPI and mekanism, then place a mekanism laser with some power, then look at the laser beam from different angles around the laser block. If I'm right it should disappear if the laser block is in line of sight.

Or if that isn't what you meant then I've just wasted some time....

jonnyawsom3 commented 5 years ago

here's a video of it, i don't know why the aspect ratio is weird but it works, and after i fly up you can see how the effect seems to reverse direction. (i know the video has other mods too but this happens when only Mekanism and DragonAPI are installed too) https://youtu.be/MjTf6Gg3ANY

OvermindDL1 commented 5 years ago

Hmm, well it's obvious 'what' is happening but not 'why'. Perhaps a culling test screwed up? Does DragonAPI touch anything related to culling, TE rendering order, etc...?

If you (anyone here that is) know how to run minecraft in a debugger then it would not be hard to figure out in that case then.

ReikaKalseki commented 5 years ago

Perhaps a culling test screwed up? Does DragonAPI touch anything related to culling, TE rendering order, etc...?

I add some ASM hooks, but the "default" result of those hooks is unchanged behavior. That is, the normally-hardcoded values are instead returns from a function which by default returns the original value.

These come to mind: https://github.com/ReikaKalseki/DragonAPI/blob/master/ASM/Patchers/Hooks/Event/Render/FrustrumDispatch.java https://github.com/ReikaKalseki/DragonAPI/blob/master/ASM/Patchers/Hooks/Event/Render/FarClipEvent.java https://github.com/ReikaKalseki/DragonAPI/blob/master/ASM/Patchers/Hooks/Event/Render/EntityRender.java

But they are not often used, and only the FarClippingPlane allows for "behavioral changes": https://github.com/ReikaKalseki/DragonAPI/search?utf8=%E2%9C%93&q=renderFrustrum&type= https://github.com/search?q=FarClippingPlaneEvent&type=Code

OvermindDL1 commented 5 years ago

@jonnyawsom3 Do you know where the version of mekanism's code that you are using is hosted at?

jonnyawsom3 commented 5 years ago

sorry for the slow reply, i'm not sure if this is what you mean (probably isn't) but here https://www.curseforge.com/minecraft/mc-mods/mekanism/files/2475797 then there's also the github https://github.com/mekanism/Mekanism/tree/1.7.10

OvermindDL1 commented 5 years ago

The github is it yep!

Which TileEntity is it, the TileEntityLaser? I'll guess it's that one for now:

So it renders using: https://github.com/mekanism/Mekanism/blob/1.7.10/src/main/java/mekanism/client/render/tileentity/RenderLaser.java

Which is unconditional rendering the model, but not the laser beam (omg their model rendering is garbage, so many stateful calls! not as bad as, say, rotarycraft or so but still, I'd have issues rendering the client with a lot of those machines like how rotarycraft kills my FPS (though not UPS, so it's fun having the game progress normally when you are getting 1fps as is my world when playing with reika's big mods)...), so where is that drawn from...

Here we go, the TileEntityLaser calls to the LaserManager to do this stuff, which is at: https://github.com/mekanism/Mekanism/blob/1.7.10/src/main/java/mekanism/common/LaserManager.java

And on line 125 it calls to Mekanism.proxy.renderLaser, which is at: https://github.com/mekanism/Mekanism/blob/1.7.10/src/main/java/mekanism/client/ClientProxy.java#L647-L651

Which creates an EntityLaser (ugh, MC's entities are so garbagely coded >.>), which is: https://github.com/mekanism/Mekanism/blob/1.7.10/src/main/java/mekanism/client/entity/EntityLaser.java

Which is a subclass of MC's normal EntityFX, which does not cull drawn particles (as they are efficiently drawn as a single batch, one of the only 'mostly' efficiently drawn things in all of MC), so if it's not being drawn than something is definitely screwing with effect rendering, like maybe the EffectRenderer.java class in normal MC.

So let's look through dragonapi and see what's futzing with the EffectRenderer...

Well, this is interesting and horrifying both... https://github.com/ReikaKalseki/DragonAPI/blob/34f1be50b93be65a2aecccba839c7e700926bcee/APIProxyClient.java#L70

        Minecraft.getMinecraft().effectRenderer = new ThrottleableEffectRenderer(Minecraft.getMinecraft().effectRenderer);

So, let's see what ThrottleableEffectRenderer does: https://github.com/ReikaKalseki/DragonAPI/blob/34f1be50b93be65a2aecccba839c7e700926bcee/Extras/ThrottleableEffectRenderer.java

Aww, this bodes poorly.. https://github.com/ReikaKalseki/DragonAPI/blob/34f1be50b93be65a2aecccba839c7e700926bcee/Extras/ThrottleableEffectRenderer.java#L149

                    if (fx != null && isParticleVisible(fx)) {

Let's see what isParticleVisible does then: https://github.com/ReikaKalseki/DragonAPI/blob/34f1be50b93be65a2aecccba839c7e700926bcee/Extras/ThrottleableEffectRenderer.java#L253-L255

    public static boolean isParticleVisible(EntityFX fx) {
        return ReikaRenderHelper.renderFrustrum.isBoundingBoxInFrustum(getBoundingBox(fx));
    }

And boom, don't even need to check isBoundingBoxInFrustum, it's name seems obvious...

Particles in MC are rendered in a single batch everywhere all the time, they are not inefficient to do this (unless someone layers too much crap on top), and they are not designed to be culled (they can cull individually and internally in their own render function if they really need, except in exceptionally rare cases this should never be done though). This is just outright wrong, and thus this is the bug.

jonnyawsom3 commented 5 years ago

holy jesus, i send my regards for you having typed all that out and looked through all the code, but of course now there's one question on my mind

can it be fixed?

OvermindDL1 commented 5 years ago

can it be fixed?

By just returning truefrom isParticleVisible, then it should fix it. However, if reika does anything bad with particles then it might cause a slight FPS hit (although in some other cases it could actually speed up the rendering due to the driver being able to cache the call better), but I doubt it would be at all noticeable.

Hmm...

jonnyawsom3 commented 5 years ago

so i guess we'll wait for reika's opinion on this

OvermindDL1 commented 5 years ago

@jonnyawsom3 I'm at work so I can't test, but I built a new dragonapi with the fix (just returning true from isParticleVisible is all, though I guess the whole check could be removed now, feel free to diff them if you want to confirm my change), unsure if it's security will complain or anything, but feel free to try it: https://overminddl1.com/minecraft/fixes/DragonAPI+1.7.10+V22d-fixed.jar

If it doesn't load because security stuff, gimme the log and I'll fix that too.

jonnyawsom3 commented 5 years ago

welp... DragonAPI+1.7.10+V22d-fixed.jar is likely not a valid compiled copy of the mod. Invalid mod jarsign fingerprint! If you are attempting to make a custom build of the code, consult Reika.

jonnyawsom3 commented 5 years ago

at the very least i've got to say, good work on the anti tamper reika

OvermindDL1 commented 5 years ago

@jonnyawsom3 Nah it's not fancy, it's just Forge's built-in key check that he enables (forgot about it), I disabled it on this one: https://overminddl1.com/minecraft/fixes/DragonAPI+1.7.10+V22d-fixed1.jar Try it again?

I could just resign it with my key, but I'm too lazy, easier to just recompile, lol.

EDIT: If you want faster turn-arounds, feel free to join us in #gt-dev on irc.esper.net IRC server and ping my name.

jonnyawsom3 commented 5 years ago

well now it gives this Manifest does not contain hash! If you are attempting to make a custom build of the code, consult Reika.

and thank's for the offer but i'm not in a rush, and i know i'd forget about it so i don't think i'll join the server

OvermindDL1 commented 5 years ago

Lol, now that's a custom check, hold on. ^.^;

OvermindDL1 commented 5 years ago

Oh, I see, instead of making a hash I'll just comment out that check too in case I need to make further changes, I should add that hash generation to my build system sometime, try this: https://overminddl1.com/minecraft/fixes/DragonAPI+1.7.10+V22d-fixed2.jar

jonnyawsom3 commented 5 years ago

took me a while to get round to it but it works

OvermindDL1 commented 5 years ago

@jonnyawsom3 So just to make sure, it fixed the issue with the laser vanishing, yes? :-)

jonnyawsom3 commented 5 years ago

yes

OvermindDL1 commented 5 years ago

Have you noticed any FPS change in busy areas with this change? I don't think you would, but who knows without testing...

jonnyawsom3 commented 5 years ago

sorry for the slow reply again, as far as i could tell from the few minutes i spent flying around the average was about 30fps (normal because of the modpack i was using to test it) and at 60fps whenever i looked upwards, however i didn't have any busy areas to test in but so far it's looking good

ReikaKalseki commented 5 years ago

This was done shortly after the release of v19 (early Nov 2017) as a result of extreme FPS lag due to particle overload (in a big part due to an insane amount of CC stuff and many CC bees) I was having in my base at the time.

It is massively helpful of FPS and I am not willing to remove it.

Also, that checks whether the particle is "on screen"; the laser beams presumably are not offscreen, and so should not be affected.

Sample particle screenshots for the above reference:

Or, with depth testing of particles off, allowing them to show their true counts:

jonnyawsom3 commented 5 years ago

Hmm, i can understand that and my first thought is having an option/setting to turn it on and off instead...

OvermindDL1 commented 5 years ago

as a result of extreme FPS lag due to particle overload (in a big part due to an insane amount of CC stuff and many CC bees) I was having in my base at the time.

That sounds like they need their own wrapped renderer, it shouldn't be touching other mods things.

It is massively helpful of FPS and I am not willing to remove it.

For your things sure, but it still shouldn't touch other mods things.

Also, that checks whether the particle is "on screen"; the laser beams presumably are not offscreen, and so should not be affected.

That's not how particle systems work. The laser beam has a point where it comes from, particles are 'points', and that point being in view does have the particles rendered, however the particle extends well beyond it's point, which is also not uncommon for particles in minecraft (or most particle systems for that matter), hence why particle systems generally don't have any form of culling in most engines (as it would then make it more expensive than the particles themselves rendering would be).

However, the crutch is that the rendering of particles in something like, say, Chromaticraft, is using way Way WAY too many EntityFX's, what they should be doing is the TE's that initially create the particles should actually be setting up state in another system that aggregates all the information together and what the particles should be doing then rendering those via anything from a single EntityFX via a nice fast and direct render call with no math needing to be done (as that should update specific array elements in the OGL array in a single call), or just render it in the forge render hook where you can control it entirely with no worries about fitting it into MC's normal particle handling so you can batch it fully efficiently. This would allow you to, for example, batch all the lines of all the CHC energy beams together into a single efficient draw call where each entire line is a single quad, and if you wanted to be extra fancy you could use a shader to make the beams do some rather more impressive things then they are doing now. Repeat that for each 'type' of particle system aggregating distinct ones together when they have similar paths and you get down to a total of, say, 8 draw calls in full. That scene in those images could easily be rendered with a total of a dozen or so thousand quads in a mere literal couple of draw calls, which would be faster to execute than just a dozen or so visibility checks alone. The current implementation is like the model rendering, extremely inefficient in about the most inefficient way possible. In general these mods don't do any form of render batching whatsoever that not only needs to but has to be done for efficiency reasons, this is why every single render engine in existence does this batching for you, but since MC doesn't have a rendering engine and just uses raw OGL calls everywhere means that you have to do this batching instead. This specifically is the reason why I and many many others can't get far in any of your big mods with a 4.2ghz 6-native-core computer where the UPS remains at 20 but the FPS tanks to <1fps with a barely medium sized base. As it is coded now it will work on a very limited set of GPU's because they do a lot of driver batching (where that is not the drivers job as it applies a rather extreme overhead thus is why most drivers don't do it) and it just absolutely stomps the command buffers on any efficiently implemented driver.

To put this in perspective, you should see what the equivalent Vulkan/DX12 code would look like (not saying to write it, and certainly not in MC, just understand it), that removes all of the 'magic' that OGL does and shows you precisely just how expensive and why it is so expensive to do the rendering that these mods do in this form.

This 'Throttleable' Particle dispatcher is just a crutch, a very poorly made band-aid for proper rendering, where if the rendering was done correctly then you would have all the special effects and styles you want with no real performance cost. GPU's work VERY differently from CPU's.

I do apologize if there was any odd tone, just tired of rehashing the same thing over and over. Particle rendering especially is one of the most easily optimized of all rendering, but the way it is being done currently is quite literally the worst way to render particles period. Let's look at the cost of things:

MC's normal Particle Handling:

Assumes a low amount of fxLayers.
Assumes a high amount of EntityFX's within each fxLayer.
For each fxLayer it first binds a texture, sets up the GL state, and creates a manual batcher (Tessellator).
For each EntityFX it does the following:
- Set the brightness for the next injected quads, this is an efficient call that changes no render state, only in 'how' the quad is inserted.
- Calls the EntityFX's renderParticle function with the tessellator (The tessellator really should be called RenderBatch or something, probably is in the original source).
- Each EntityFX's renderParticle function then just calculates the UV's, the scale, and the position, then injects those into the manual batcher/tessellator via addVertexWithUV calls.
When all particles in the given fxLayer are batched, it then calls .draw() of the batcher to send all the batched calls in a single GPU command.

DragonAPI's ThrottleableEffectRenderer:

Does the same as the above, except for each EntityFX it also:
- Iterates via iterators instead indices, this is slightly slower because Java iterators are garbage, this is an extra cost, though a minor one.
- Tests if isParticleVisible, which is performing 144 multiplications and 144 additions with with 48 conditionals, 36 comparisons, and 36 indirect jumps, each of which are wildly random on each call so the CPU won't be able to cache their information, blowing the entire opcode pipeline of the CPU (why do you think MC doesn't DOES NOT PERFORM FRUSTRUM CHECKS EXCEPT ON GROUPS, like chunks, you absolutely DO NOT perform frustrum checks on each individual particle emitter, wtf there...)
- Tests if isEntityCloseEnough, which does an instanceOf check (not free, but decently cheap), and a few piddly other checks, this is decently fine though why is it being done at all when the TE on the client side should be doing this by either creating the EntityFX or not, so there is a whole bit of WTF'ery here.

So in summary, MC's normal method efficiently batches them (though it doesn't hold on to the batch and update individual entities from frame to frame, this is stupid, but eh it's Notch, this whole thing could effectively become free in a normal particle system setup), but ThrottleableEffectRenderer is doing a huge amount of math that will near certainly blow the CPU opcode pipeline every-single-time it's called, multiple times, changing the 'shape' of the rendered batch sent to the GPU each time as well (making it so the driver can't cache it for the ones that do, they shouldn't though, but nvidia's drivers are known for being particularily garbage, if only they had driver quality that matched their hardware quality...).

What really 'should' be done, not-withstanding MC's usual crap code:

With shaders (this is how literally every modern rendering engine does particles):
- Put all particles into a single packed array of emitter location, seed, physics properties (I.E. what it does like velocity, randomness, rotation, etc...).
- Then send that entire single packed array to the GPU, keep it synced with changes.
- Then let the shader perform the steps on the particles to calculate their current state and render them
Without shaders (if you want OGL 1.2 to run on 30 year old GPU's or some garbage, really, just use shaders and require a GPU made in the past 20 years):
- Put all particles into a couple of packed arrays depending on update type.
- For each array update the particles based on their update type.
- For each array update the render batch (VBO or whatever you want) with the changes and send to the GPU to draw them

As you can see, the MC world already 'kind' of does this, an fxLayer is an update type (although it just distinguishes based on texture, it could do it better, but this is the minimum), and instead of having a packed array for efficient updating it instead has the array hold pointers to objects (EntityFX's) and have them add to the batch themselves, which is a dynamic call, which is entirely needless overhead when each EntityFX could be a singular global type that things register into for efficient updating.

This kind of code is extremely friendly to the CPU opcode pipeline and GPU command queue both and you can process, quite literally millions of particles at full 60FPS with shaders, and easily hundreds of thousands of particles at full 60fps without shaders.

MC's method is 'mostly' there, some minor changes would make it a lot more efficient but it's still not bad (one of the better parts of MC's rendering in general at least).

ThrottleableEffectRenderer on the other hand absolutely destroys the efficiency. Instead of isParticleVisible and isEntityCloseEnough the client-side TE that creates the EntityFX instance (should not be created on the server! Ever!), should only do so if the TE is both loaded (within 10 chunks or so by default) and if it thinks the particle is close enough to be visible (all depends on the particle). Most particle usages in MC rely on being within 16 blocks of the emitter, very few are 'long range' (definitely there though). Visibility tests should be handled by the GPU or by batched by chunk (or even larger).

Honestly though, for things like CHC's glowing luman lines and all among others I'd not use particles at all, I'd use the forge render hook to render them all in a single massive efficient batch that is only updated/regenerated when the state changes, that would make it essentially free, a single draw call with all of them in the cached batch, updated by the TE's when necessary. If you didn't mind shaders you could make even the 'updating' of them essentially free as well.

And proper git practices really should be used, all of this kind of stuff I'd have PR'd fixes for years ago if so.

Hmm, i can understand that and my first thought is having an option/setting to turn it on and off instead...

Especially if you don't use CHC but just rotorycraft and so forth, this option would be very useful. Really the check needs to be removed entirely though and the TE's should handle whether they are 'close' enough to spawn the EntityFX or not (and if you really want things like the lines visible from outside render range then the forge render hook).

jonnyawsom3 commented 5 years ago

So essentially you figured out a way to remake the render to get a massive performance boost... Something tells me reika either won't like that idea, or if he actually agrees and decides to remake it, that would be a lot of work (presumably)

Skyler-Altol commented 5 years ago

So essentially you figured out a way to remake the render to get a massive performance boost... Something tells me reika either won't like that idea, or if he actually agrees and decides to remake it, that would be a lot of work (presumably)

If he does decide to remake it, it will be a bloody miracle.

@OvermindDL1 I would love to see a custom build of DAPI and ROC (I left out the other mods because optimising CRC in particular would be a massive task) with your optimisations + any others you come up with during your perusal of the code base, and compare the frame rate of that to the frame rate of the current available public release, which is V22d.

Because if the differences are as drastic as you say (and I am certainly inclined to agree with you that they will be), then perhaps Reika will at least consider making changes.

jonnyawsom3 commented 5 years ago

i feel like with all the custom versions we're talking about, reika might have something to say about it....

OvermindDL1 commented 5 years ago

I won't distribute changes beyond simple trivial things like the above as tests only. I do have a gradle build system wrapping reika's mods that builds everything correctly, in addition to a few changes within it in forked code (mostly relating to reika's mods using some old mod api's from MC 1.6 or 1.5 that don't work on the modern mods, though the use of the fork is optional and I didn't here, I don't even know 'how' reika manages to use old versions of mod API's unless they are doing something outright wrong like copying mod API's into the filesystem instead of using proper java/gradle/maven/sbt/ivy/whatever dependency handling, which would have caught these bugs), in addition to a model change on large turbine I did as a test to see if it solved the FPS issues around those (it did, the existing model handling is quite literally the exact wrong way to hand models to the GPU, this fork is from the old version as well) but I didn't touch any other models or the lag they create.

If reika used git actually correctly I would have fixed these issues quite literally years ago via PR's, but currently I don't play MC anymore, so eh...

jonnyawsom3 commented 5 years ago

Fair enough

Skyler-Altol commented 5 years ago

@OvermindDL1 would you be willing to publicise the code related to the turbine model fix?

OvermindDL1 commented 5 years ago

It's still based on reika's code, just significantly cleaned up, so no. It's not hard to fix though, you just need to batch each part transformed via the same per-frame transformation into singular batches, for the turbine I used 3 batches, though I could have easily used only 1 if I used shaders.

jonnyawsom3 commented 5 years ago

Now I'm unsure if/when to close this, as reika hasn't seen half of it yet

OvermindDL1 commented 5 years ago

Usually best to leave it open unless the root issue is resolved.

MachiavelliSeraphim commented 5 years ago

I would not close this. Let Reika make the call on what to do.

It is interesting that the rendering system can be so optimized. Ive always wondered because of how demanding ChC and ReC were whenever I used them.

jonnyawsom3 commented 5 years ago

I won't

OvermindDL1 commented 5 years ago

It is interesting that the rendering system can be so optimized. Ive always wondered because of how demanding ChC and ReC were whenever I used them.

The specific issue I detailed in another post, but in essence the model rendering in RoC/ChC/Etc are using GL transformation calls to position, essentially, boxes around, which for something with a turbine with 500 fins in it is a TON of calls. On high-end enough GPU's with a fast enough PCI bus then the excess calls won't really be noticeable until it gets to become 'too' excessive, but on others the bus transfer can't keep up.

Think of it this way, there is a set of CommandQueue's, OGL exposes 'one' at a time (to work around that you have to do context transfers between threads and such annoyances, it's much easier in Vulkan). A CommandQueue batches up a set of calls into a single Command to run on the GPU over the set of data. In general there are a few hundred CommandQueue transfers per frame in an average high-end game (or maybe tens at most in simple games), Reika's model rendering code is causing literal thousands to be transferred (in some cases per model like the large turbine) on such cards/systems. They have simple fixes as well interestingly enough, though it would require rewriting the models (or use a model format) as right now they are very inefficiently generated in code.

My personal thoughts on it is that since the models are written in a way that is efficient for a CPU (serial instruction pipelining on serial data), but not for a GPU as unfortunately that is precisely the backwards way that it needs to be commanded on the GPU (data needs to be combined and worked on in parallel via parallel executed identical commands), then that is why I think the model author(s) just didn't have the knowledge at the time when they were written and reika followed the same pattern.

It's actually very easy to send data to a GPU efficiently, every single one of reika's models could be rendered with a single draw call if shaders were used, even with their multitude of distinct rotating and moving parts, without shaders you can still reduce it to a single state transformation call per 'group', so even on the large turbine that would be just 3 or so state transformation calls per model. Then you can increase the efficiency further by instead of rendering each model individually you'd batch up the models into a single render call as well, a shader taking the single model, a list of 4x4 matrix transformations (to know 'where' to render them) combined with a per-model state (to know where to place the rotations and so forth, a single scalar so it's trivial) and it just renders out all the models in the world in parallel. This is the traditional render pipeline (that MC is lacking, but isn't hard to make yourself) that was common many decades ago. Nowadays you can get even far more efficient as well.

MachiavelliSeraphim commented 5 years ago

Ive always had a dedicated GT640 or better because of the load.
If I could have gotten away with Integrated Graphics, Id have played more often.

Makes sense why it loads down. Same issues with SSDs and using ZFS. Give it a million tiny transactions and they make your eyes bleed waiting. Give it a few large transactions and its done before you even notice.

OvermindDL1 commented 5 years ago

Give it a million tiny transactions and they make your eyes bleed waiting. Give it a few large transactions and its done before you even notice.

Lol, what an accurate and wonderfully graphic and short description of what is going on. ^.^

ReikaKalseki / Reika_Mods_Issues

DragonAPI causes Mekanism laser beams to not render at certain angles in 1.7.10 #2172