FPS drop when looking at High Pressure Turbines and/or Turbine Generators

raelik commented 7 years ago

I have a fusion plant build where I built one large turbine room directly above the reactor room, and have managed to cram 16 HP Turbines (half with generators, half without) in a 4x4 chunk area. If the turbines are not running, my FPS drops from a solid 60, down to about 20 if I'm in the reactor room and look towards the ceiling where the turbines are. This is closer to 17-18 if they are running. These numbers drop to 13-16 if I actually go into the turbine room and try to get as many of the turbines in my field of vision as possible.

I'm assuming this is just simply because of the renderer limitations of Minecraft. Would I be able to mitigate this by separating my turbines, putting them deep under the reactor close to y=0, and splitting them up into more separated chunks?

ReikaKalseki commented 7 years ago

Their TESR has no load difference between when they are running and when they are not. However, running HP turbines do spawn a large number of particles.

As for the generators, there is literally no difference between and on-state and an off-state one.

raelik commented 7 years ago

The particles explain the marginal FPS difference between running and not running then. I had suspected that was the case, and isn't really much of an issue. The larger issue is the FPS drop in general (running or not), and if putting them near bedrock and separating them by a couple chunks is a workable mitigation strategy.

raelik commented 7 years ago

As of now, I can only assume that it's just due to the sheer number of them I have crammed in pretty much as tight as a space as they can fit. Also, 4 of them are butted right up against the turbine generators of 8 of the other turbines, as you can see in this screenshot:

Screenshot

ReikaKalseki commented 7 years ago

Your issue is probably part "lots of TESRs" and part because the HP turbine model is one of the few that is legitimately high-poly.

raelik commented 7 years ago

That's what I figured. I don't see much or any of a drop looking at one or two of them, likely because I've capped my FPS at 60 (vsync, 60hz monitor) and I'm using a GTX 970. It's pretty linear as I look at more of them at once.

ReikaKalseki commented 7 years ago

I'm using a GTX 970

For MC, this is irrelevant; all rendering is CPU-based.

raelik commented 7 years ago

It obviously uses it to some degree, my GPU speed and temps spike while I'm running MC.

ReikaKalseki commented 7 years ago

Well, your GPU still ultimately handles the data sent to your display, but that amount of data is small enough and simple enough that even integrated graphics perform similarly in MC.

OvermindDL1 commented 7 years ago

Well, your GPU still ultimately handles the data sent to your display, but that amount of data is small enough and simple enough that even integrated graphics perform similarly in MC.

Actually a GPU can be hit really hard in MC. It depends on how many state changes are done. A raw sheer number of polygons (like in the turbines) may upload a lot of data, but once sent it will not really hit hard at all unless something incredibly inefficient is done like re-sending it to the GPU memory every frame (which RotaryCraft does, one of two big things that contributes to it's significant client-side rendering performance issue), but the big thing that will hit the rendering performance is the number of state changes done, like popping the rendering stack will cause a flush.

But yes, for this issue I'd rank that this is one of the cases where re-sending a comparatively massive rendering call is what is causing the performance hit. It is not the GPU being fast enough to render it, but rather that sending that information every frame is absolutely saturating the GPU bandwidth, and if @raelik hooked up an OpenGL debugger then I'd bet it would confirm that.

Let's take this one (I'm unsure which turbine specifically this is for, but this is a great example): https://github.com/ReikaKalseki/RotaryCraft/blob/2fe52b194a377c48d35b3861dfdab07077c07cb7/ModInterface/Conversion/ModelSteamTurbine.java#L202-L222

        int j = 0;
        for (double k = -0.375; k <= 0.375; k += 0.09375) {
            GL11.glTranslated(0, 0, k);
            GL11.glTranslated(0, dd, 0);
            GL11.glRotatef(j, 0, 0, 1);
            GL11.glTranslated(0, -dd, 0);
            for (int i = 0; i < 360; i += 22.5) {
                GL11.glTranslated(0, dd, 0);
                GL11.glRotatef(i+phi, 0, 0, 1);
                GL11.glTranslated(0, -dd, 0);
                blade.render(te, f5);
                GL11.glTranslated(0, dd, 0);
                GL11.glRotatef(-i-phi, 0, 0, 1);
                GL11.glTranslated(0, -dd, 0);
            }
            GL11.glTranslated(0, dd, 0);
            GL11.glRotatef(-j, 0, 0, 1);
            GL11.glTranslated(0, -dd, 0);
            GL11.glTranslated(0, 0, -k);
            j += 15;
        }

So this is calling a nested loop, inside a nested loop, each of which have a large amount of state calls, which would not normally flush, however if we look inside the blade.render(TileEntity, float) call then we see: https://github.com/ReikaKalseki/DragonAPI/blob/master/Instantiable/Rendering/LODModelPart.java#L138-L145

    public final void render(TileEntity te, float pixelSize)
    {
        double d = this.calcAndCacheRenderDistance(te);

        if (!te.hasWorldObj() || MinecraftForgeClient.getRenderPass() == -1 || this.shouldRender(d)) {
            super.render(pixelSize);
        }
    }

So each invocation of that render is doing a double calcAndCacheRenderDistance(TileEntity) call, which is:

    private double calcAndCacheRenderDistance(TileEntity te) {
        EntityPlayer ep = Minecraft.getMinecraft().thePlayer;
        long time = Minecraft.getMinecraft().theWorld.getTotalWorldTime();
        if (te == lastLocation && time == lastTime) {
            return lastDistance;
        }
        else {
            double rx = ep.posX;
            double ry = ep.posY;
            double rz = ep.posZ;
            double dx = rx-te.xCoord-0.5;
            double dy = ry-te.yCoord-0.5;
            double dz = rz-te.zCoord-0.5;
            double d = dx*dx+dy*dy+dz*dz;
            lastLocation = te;
            lastTime = time;
            lastDistance = d;
            return d;
        }
    }

Not too costly, though it is more complex than I'd probably go for (and is very serialized in the calls considering how many times this will be called for just one turbine), however the render function then calls super.render(pixelSize);, which calls into Minecraft's ModelRenderer call, which does:

    public void render(float p_78785_1_) {
        if (!this.isHidden && this.showModel) {
            if (!this.compiled) {
                this.compileDisplayList(p_78785_1_);
            }

            GlStateManager.translate(this.offsetX, this.offsetY,
                    this.offsetZ);
            int var2;

            if (this.rotateAngleX == 0.0F && this.rotateAngleY == 0.0F
                    && this.rotateAngleZ == 0.0F) {
                if (this.rotationPointX == 0.0F
                        && this.rotationPointY == 0.0F
                        && this.rotationPointZ == 0.0F) {
                    GlStateManager.callList(this.displayList);

                    if (this.childModels != null) {
                        for (var2 = 0; var2 < this.childModels.size(); ++var2) {
                            ((ModelRenderer) this.childModels.get(var2))
                                    .render(p_78785_1_);
                        }
                    }
                } else {
                    GlStateManager.translate(this.rotationPointX
                            * p_78785_1_, this.rotationPointY * p_78785_1_,
                            this.rotationPointZ * p_78785_1_);
                    GlStateManager.callList(this.displayList);

                    if (this.childModels != null) {
                        for (var2 = 0; var2 < this.childModels.size(); ++var2) {
                            ((ModelRenderer) this.childModels.get(var2))
                                    .render(p_78785_1_);
                        }
                    }

                    GlStateManager.translate(-this.rotationPointX
                            * p_78785_1_,
                            -this.rotationPointY * p_78785_1_,
                            -this.rotationPointZ * p_78785_1_);
                }
            } else {
                GlStateManager.pushMatrix();
                GlStateManager.translate(this.rotationPointX * p_78785_1_,
                        this.rotationPointY * p_78785_1_,
                        this.rotationPointZ * p_78785_1_);

                if (this.rotateAngleZ != 0.0F) {
                    GlStateManager.rotate(this.rotateAngleZ
                            * (180F / (float) Math.PI), 0.0F, 0.0F, 1.0F);
                }

                if (this.rotateAngleY != 0.0F) {
                    GlStateManager.rotate(this.rotateAngleY
                            * (180F / (float) Math.PI), 0.0F, 1.0F, 0.0F);
                }

                if (this.rotateAngleX != 0.0F) {
                    GlStateManager.rotate(this.rotateAngleX
                            * (180F / (float) Math.PI), 1.0F, 0.0F, 0.0F);
                }

                GlStateManager.callList(this.displayList);

                if (this.childModels != null) {
                    for (var2 = 0; var2 < this.childModels.size(); ++var2) {
                        ((ModelRenderer) this.childModels.get(var2))
                                .render(p_78785_1_);
                    }
                }

                GlStateManager.popMatrix();
            }

            GlStateManager.translate(-this.offsetX, -this.offsetY,
                    -this.offsetZ);
        }
    }

Now what this utter and intense decompiled horror is saying is that it is indeed building a DisplayList (whooo!), however it is popMatrix()'ing Every-Single-Danged-Call, potentially multiple times.

Looking back at the ModelSteamTurbine function where it builds up the model, it has a lot of things like:

        blade = new LODModelPart(this, 0, 19);
        blade.addBox(-0.5F, -6F, -0.5F, 1, 6, 1);
        blade.setRotationPoint(0F, 16F, 0F);
        blade.setTextureSize(128, 128);
        blade.mirror = true;
        this.setRotation(blade, 0F, 0F, 0F);

So it is just adding a box is all, so probably just 8 vertices and so forth. So 8 vertices are being stored into a DisplayList internally by the minecraft ModelRenderer class, which is great, however then this DisplayList is getting called umpteen number of times, each of which is utterly killing/flushing the render pipeline.

So in general, this was programmed to be almost as perfectly inefficient as possible (turning the ton o' displaylist batch calls into direct model sending would be barely more inefficient overall, inconsequential), and by not using the horribly programmed ModelRenderer in Minecraft itself then you could even get better performance if the programmer knows how OpenGL really works.

What 'should' have been done is for each and every combination of Turbine sizes (as long as it is something reasonable like <50 or so) a ModelRenderer (or in Reika's case, a LODModelPart) should be made that completely and entirely represents a single entire moving component (and anything that never moves should be in the world renderer, not the TESR), and in the render it should be rendered once (with a given rotation for however it is rotating) for each rotating part instead of each fin individually. This would drop the number of Batch Flush's down from a couple hundred/thousand PER turbine down to, oh, probably 4-8 for a big turbine depending on how well implemented.

For note, it is little things like this all over Reika's mods that cause the huge (client) performance issues and I really really hope that by being more detailed in my descriptions why that it will cause these issues to get fixed so I and others can actually run the mod instead of hitting <1fps after getting a couple dozen little machines. Death by a thousand paper cuts is what is happening here, some GPU's handle them better than others, but it will bring down even the mightest of beasts with enough of these cuts...

raelik commented 7 years ago

@OvermindDL1 I'll see if I can profile MC with NVIDIA Nsight this evening, to confirm or deny your hypothesis.

ReikaKalseki commented 7 years ago

I am not willing to manually make a model with 300 blades.

OvermindDL1 commented 7 years ago

I am not willing to manually make a model with 300 blades.

Oh god no, that would suck. Instead you just need to instead of rendering, say, each blade individual, instead just keep addBox'ing to a single Lod model to build up one big one (one for each type of rotation). :-)

@raelik Any results?

OvermindDL1 commented 7 years ago

To be specific, addBox I doubt would work since it doesn't look like you can rotate them, just feed the coordinates in to it straight, which is easily done just by having a box that you translate with a translation matrix as you do in the render body, but do it only once and bake the translation into the coordinates, dead simple and even less code than you have now. :-)

raelik commented 7 years ago

@OvermindDL1 Not yet. The difficulty there is that VS has to be able to launch the Minecraft java command line to start tracing Minecraft's OpenGL calls. I didn't quite get that working last night (I think I was close, it was complaining about not being able to find lwjgl in the library path).

OvermindDL1 commented 7 years ago

@OvermindDL1 Not yet. The difficulty there is that VS has to be able to launch the Minecraft java command line to start tracing Minecraft's OpenGL calls. I didn't quite get that working last night (I think I was close, it was complaining about not being able to find lwjgl in the library path).

It can't follow the forking the JVM does on startup through the launcher? The AMD tools for OGL on linux can just fine?

raelik commented 7 years ago

@OvermindDL1 I tried that, and it didn't seem like it followed it properly. Maybe I missed something there, I'll have to try it again later tonight.

raelik commented 7 years ago

@OvermindDL1 Also, the model code of interest here would actually be https://github.com/ReikaKalseki/ReactorCraft/blob/master/Models/ModelBigTurbine.java and https://github.com/ReikaKalseki/ReactorCraft/blob/master/Models/ModelTurbine.java

OvermindDL1 commented 7 years ago

@raelik Ahh, I never managed to get in to ReactorCraft since by that point RotaryCraft made my game a slideshow...

And yes, those files you linked are intensely inefficient, like so so much worse than that previous code I linked at larger turbine sizes... I've no doubt that is the problem you are experiencing.

raelik commented 7 years ago

@OvermindDL1 To be fair, @ReikaKalseki has done a pretty good job optimizing just about everything else. I don't get any FPS drops from the basic RotaryCraft stuff, nor from most of ReactorCraft. Just the turbines. Not sure when the last time you played with RotaryCraft was, I don't have any issues with it.

OvermindDL1 commented 7 years ago

@OvermindDL1 To be fair, @ReikaKalseki has done a pretty good job optimizing just about everything else. I don't get any FPS drops from the basic RotaryCraft stuff, nor from most of ReactorCraft. Just the turbines. Not sure when the last time you played with RotaryCraft was, I don't have any issues with it.

I've played v17 a bit to see if any client-side improvements, there was none.

And yes, the back-end is quite efficient, even the CPU-part of the front-end is quite efficient, it is the OpenGL calls that are horribly inefficient. Depending on your GPU and drivers you may not experience it very much until you get to some of the extreme cases (like yours), but on my system with my very old AMD Radeon 5770 it utterly kills my system, when no other mod in any massive mod packs do anything of the sort. My desktop gets constant 60fps in every setup in all the mod packs I've tried, except for anything containing any mods based on DragonAPI, and it is solely due to the sheer number of OpenGL calls when I last bench'd it (which I should do so again with a dump, though that ends up being gigabytes in size in just a few seconds due to how many OpenGL calls it made in my last world, and I was not anywhere near higher technology yet, just about to get bedrock ingots, probably around 40 RC machines and dozens of piping and so forth).

raelik commented 7 years ago

@OvermindDL1 Ok, I see your point, suffering through an old GPU (I've been there).

raelik commented 7 years ago

@OvermindDL1 Here's a screenshot of my OpenGL API call summary (the top 10 or so), sorted by number of calls:

This was after looking at the turbines for probably 10-15 seconds. I think the total capture time was 19 seconds or so.

OvermindDL1 commented 7 years ago

Not exactly the full call trace I was expecting (I'm used to AMD's tools, which apparently give a lot more information than nVidia's), but over 4.7 million translations taking a combined total of a half second is definitely a little high regardless, however the 'max' is showing over 3.2ms for at least one single translate call? the entire allocated time for a single rendered frame at 60fps is only ~16.6ms, and the average for a calllist call is almost a full millisecond (when it is averaging over 500 batched calls per frame), so yes, your FPS definitely dropped because of excessive OGL flushes. Welcome to the very tip of a very large performance issue that I and many others experience. ;-)

raelik commented 7 years ago

@OvermindDL1 I do have the full call trace dump, but it's about 1 GB. I can also get the OpenGL Draw Call dump, which also has some interesting information in it, namely the latency between the draw calls and the GPU actually processing them, which is indicative of how much bandwidth those calls are using.

OvermindDL1 commented 7 years ago

@OvermindDL1 I do have the full call trace dump, but it's about 1 GB. I can also get the OpenGL Draw Call dump, which also has some interesting information in it, namely the latency between the draw calls and the GPU actually processing them, which is indicative of how much bandwidth those calls are using.

Heh, that sounds more normal.

I'm not sure they are using a lot of bandwidth from what I see in the code, rather it looks like that there are just so many calls, and the natural latency of the northbridge bus on the motherboard will actually become important there, in addition to the natural processing time of the GPU that now must spool up less than a full chunk of cores to work on the data, thus wasting significant amount of processing abilities as a lot of them are now forced to do no-op work.

Elaboration on the cores, a GPU has a number of effectively concurrent 'cores' (in modern traditionally CPU terms) except they are multi-data-single-call, as in they all process the exact same instructions over different data (usually vertices and pixels). If a card has, say, 2048 cores on it then they are batched into parts of, oh, 256 cores, so when you send a displaylist of a single box, that has 8 vertices, well a whole set of 256 cores has to spool up to work on those 8 vertices, thus 248 cores are utterly wasted for this cycle (plus if their rendering is only using, say, 8 pixels for such a box then the cores on the pixel processing step are also equally wasted). This is why batching as many vertices into a single displaycall is so important, the latency of sending the information 'to' the card over the motherboard northbridge plus the amount of cores that can work on the data on the GPU are hard considerations that mandate that you need to reduce the number of calls and maximize the amount of data operated on at once.

raelik commented 7 years ago

@OvermindDL1 Ok, that makes sense. All that said, it isn't clear to me how you would implement what you suggested (@ReikaKalseki may have a clearer idea):

To be specific, addBox I doubt would work since it doesn't look like you can rotate them, just feed the coordinates in to it straight, which is easily done just by having a box that you translate with a translation matrix as you do in the render body, but do it only once and bake the translation into the coordinates, dead simple and even less code than you have now.

I get the concept, you're advocating doing the base transforms once to build up the model during initialization, instead of doing them 100's of times during rendering. ~~I don't quite see how you would do this, as you can only call addBox once on a ModelRenderer in Minecraft, and said box is the only shape it supports.~~

I see that you CAN call addBox() multiple times, each time it adds a new ModelBox to the cubeList for that ModelRenderer. I'm assuming that you're suggesting using this to add each blade as it's own box to a single LODModelPart representing the entire set of blades, and that he would do a matrix transform on the box coordinates of each blade (in CPU) to establish their sizes and positions relative to each other (i.e. blade angle and twist).

ReikaKalseki commented 7 years ago

@ReikaKalseki may have a clearer idea

In fact, I have none at all, mainly because I have no idea what has even been suggested.

raelik commented 7 years ago

@ReikaKalseki I edited my response after I realized I was wrong about addBox. I'm pretty sure he's suggesting that you add (in a loop) each blade as an individual box (instead of transforming and re-rendering the same blade box multiple times), and do a manual matrix transform to get the correct vertex coords to put the blade at the right angle and twist. This should work, with the only disadvantages I can see are the higher CPU load when initializing (i.e. completing the multi block), the fact that you'd have to call reset() on the stages if they took damage (if you wanted to keep the "missing blades" appearance when the turbines get damaged), and higher memory consumption since you'd be storing all of these vertices as ModelBox objects in the cubeList. I'd take that trade though (memory for better FPS).

ReikaKalseki commented 7 years ago

There is a bigger problem with that: Models are singletons, not per-turbine. So if you damaged one turbine, they would all start to look damaged. Unless you want to re-instantiate models every tick - which would nullify all of the above benefits and add a great deal of new load - you can only pass conditionals into the actual render method.

raelik commented 7 years ago

Actually, I think there's an even bigger problem than that. addBox doesn't take a set of coordinates for each vertex. It takes an origin point, 3 dimensions, and optionally a scaling factor. There is no way to establish any kind of rotated position for the boxes. They are always established cardinally with regards to the rotation point.

OvermindDL1 commented 7 years ago

I don't think what he's getting at is possible in 1.7.10

It is entirely possible and I was doing it way back in Alpha 1.2 days when the nether was still new.

Honestly, ModelRenderer is an entire load of rubbish and they were idiots for putting it in as it is.

LODModelPart in DragonAPI could be awesome, and could entirely reimplement ModelRenderer (instead of deriving from it) and do it properly.

Because right now LODModelPart is just implementing and delegating to ModelRenderer, it is inheriting the bad design of ModelRenderer (which is built to handle boxes and boxes only).

I wish I could find my old code (pretty sure dead drive) that handle model rendering efficiently, I had a class that does it right long before minecraft's ModelRenderer class ever existed, but it would not be too hard to re-create it at all, technically you would not even need a LODModelPart either.

For a stop-gap measure you could even make a class like ReifiedLODModelPart that takes a list of LODModelPart's and transformations for them then just reify it all into a single displaylist, but that involves calculating all the intermediates too, which is extra work for the GPU for no reason.

In some pseudo-code (eh, most of it might compile once you get the right imports), I might make something like this as just a stopgap since it can re-use most of the existing code:

import net.minecraft.client.model.ModelBox;
import net.minecraft.client.model.TexturedQuad;
import net.minecraft.client.renderer.GLAllocation;
import net.minecraft.client.renderer.GlStateManager;
import net.minecraft.client.renderer.Tessellator;
import net.minecraft.client.renderer.vertex.DefaultVertexFormats;
import net.minecraft.client.renderer.VertexBuffer;
import net.minecraft.util.math.Vec3d;
import org.lwjgl.util.vector.Vector3D;

public class BetterModelRenderer {
  private int displayList;
  private VertexBuffer vertexBuffer;

  public BetterModelRenderer() {
    this.displayList = GLAllocation.generateDisplayLists(1);
    GlStateManager.glNewList(this.displayList, 4864);
    this.vertexBuffer = Tessellator.getInstance().getBuffer();
  }

  public final BetterModelRenderer addPoly(Matrix4f mat, PositionTextureVertex v0, PositionTextureVertex v1, PositionTextureVertex v2) {
    this.addPoly(mat, v0, v1, v2, false);
  }

  public final BetterModelRenderer addPoly(Matrix4f mat, PositionTextureVertex v0, PositionTextureVertex v1, PositionTextureVertex v2, boolean invertNormal) {
    Vec3d v0v = new Vec3d();
    Vec3d v1v = new Vec3d();
    Vec3d v2v = new Vec3d();

    Matrix4f.transform(mat, v0.vector3D, v0v);
    Matrix4f.transform(mat, v1.vector3D, v1v);
    Matrix4f.transform(mat, v2.vector3D, v2v);

    Vec3d vec3d = v1v.subtractReverse(v0v);
    Vec3d vec3d1 = v1v.subtractReverse(v2v);
    Vec3d vec3d2 = vec3d1.crossProduct(vec3d).normalize();

    float f = (float)vec3d2.xCoord;
    float f1 = (float)vec3d2.yCoord;
    float f2 = (float)vec3d2.zCoord;

    if(invertNormal)
    {
        f = -f;
        f1 = -f1;
        f2 = -f2;
    }

    renderer.begin(7, DefaultVertexFormats.OLDMODEL_POSITION_TEX_NORMAL);

    renderer
      .pos(v0v.xCoord * (double)scale, v0v.yCoord * (double)scale, v0v.zCoord * (double)scale)
      .tex((double)v0.texturePositionX, (double)v0.texturePositionY)
      .normal(f, f1, f2)
      .endVertex();

    renderer
      .pos(v1v.xCoord * (double)scale, v1v.yCoord * (double)scale, v1v.zCoord * (double)scale)
      .tex((double)v1.texturePositionX, (double)v1.texturePositionY)
      .normal(f, f1, f2)
      .endVertex();

    renderer
      .pos(v2v.xCoord * (double)scale, v2v.yCoord * (double)scale, v2v.zCoord * (double)scale)
      .tex((double)v2.texturePositionX, (double)v2.texturePositionY)
      .normal(f, f1, f2)
      .endVertex();

    Tessellator.getInstance().draw();
  }

  public BetterModelRenderer compile() {
    if(this.vertexBuffer == null) throw DontCompileMoreThanOnce();

    GlStateManager.glEndList();
    this.vertexBuffer = null;
    return this;
  }

  public final render() {
    GlStateManager.callList(this.displayList);
  }
}

I think MC's Vec3D is not as capable of lwjgl's Vector3f, so you may need to convert as necessary, I don't remember precisely. You can of course add helpers, like an addBox to add a box. You pass in a matrix for how to translate the given renderable data. But this is the minimum necessary to be able to render anything efficiently. Just for each 'model' just make one of these per game and use it for all things that need it. You can use it like (no:

BetterModelRenderer myTurbineModel = new BetterModelRenderer()

Matrix4f mat = new Matrix4f(); // Identity right now
mat.translate(new Vector3f(1.0, 0.0, 0.0));  // Let's translate this part along x by 1.0 units
mat.rotate(pi*0.5, new Vector3f(0.0, 0.0, 1.0)); // Then let's rotate this a quarter turn around the Z axis

myTurbineModel.addPoly(mat,
  new PositionTextureVertex(0.0, 0.0, 0.0, 0.0, 0.0), // This is just the normal minecraft PositionTextureVertex class
  new PositionTextureVertex(0.0, 1.0, 1.0, 1.0, 1.0), // It takes x, y, z of the coordinate in local space, then the
  new PositionTextureVertex(0.0, 1.0, 0.0, 1.0, 0.0)  // u/v texture coordinate of the texture from 0.0 to 1.0
  )

mat.setIdentity(); // Set it to the identity matrix again to reset it
mat.translate(new Vector3f(1.0, 0.0, 0.0)); // Move it out along X by 1.0 units
int j = 0;
for (double k = -0.375; k <= 0.375; k += 0.09375) { // Now doesn't this look familiar
  myTurbineModel.addPoly(mat,
    new PositionTextureVertex(0.0, 0.0, 0.0, 0.0, 0.0),
    new PositionTextureVertex(0.0, 1.0, 1.0, 1.0, 1.0),
    new PositionTextureVertex(0.0, 1.0, 0.0, 1.0, 0.0)
    )
  myTurbineModel.addPoly(mat, // Calling addPoly twice to make a quad/square
    new PositionTextureVertex(0.0, 0.0, 0.0, 0.0, 0.0),
    new PositionTextureVertex(0.0, 0.0, 1.0, 0.0, 1.0),
    new PositionTextureVertex(0.0, 1.0, 1.0, 1.0, 1.0)
    )
  mat.rotate((pi*2.0)*(1.0/10.0), new Vector3f(0.0, 0.0, 1.0)); // Rotating in 10 parts around the Z axis to make fins around
}
// You could of course make an `addQuad` helper to make quads, then an `addBox` helper to make a box, etc...

myTurbineModel.addPoly()

myTurbineModel.compile(); // Call this to finalize the model and cache it to the GPU's memory, instead of each individual tiny box like the current renderers do

myTurbineModel.render(); // Then call this whenever you are ready to render it, you should have already translated it into the world as necessary at this point

Or something like that, it is not hard at all to do.

OvermindDL1 commented 7 years ago

Ah, more messages!

There is a bigger problem with that: Models are singletons, not per-turbine. So if you damaged one turbine, they would all start to look damaged. Unless you want to re-instantiate models every tick - which would nullify all of the above benefits and add a great deal of new load - you can only pass conditionals into the actual render method.

What about just replacing the texture that you render them with to one that looks more damaged in various steps? Same model, different texture then.

Or make a few model instances of various damage state.

Actually, I think there's an even bigger problem than that. addBox doesn't take a set of coordinates for each vertex. It takes an origin point, 3 dimensions, and optionally a scaling factor. There is no way to establish any kind of rotated position for the boxes. They are always established cardinally with regards to the rotation point.

Correct, this is one of multiple ways the stock ModelRenderer is horrible. My pseudo-code replacement above does not have the issue. And with more code you could make it wonderfully easy to use, the code above is very basic to build on.

But really, it should just take a matrix. Passing in translation, rotation, and scaling individually is painful, matrices are just easier, plus it is only done once at program load.

ReikaKalseki commented 7 years ago

What about just replacing the texture that you render them with to one that looks more damaged in various steps? Same model, different texture then.

Or make a few model instances of various damage state.

All infinity of them?

raelik commented 7 years ago

LODModelPart in DragonAPI could be awesome, and could entirely reimplement ModelRenderer (instead of deriving from it) and do it properly.

@OvermindDL1 I thought you might be suggesting this, but I wasn't sure so I went down the ModelRenderer.addBox() rabbit hole first.

@ReikaKalseki Yeah, having separate "damaged" model instances probably wouldn't work for the turbines, unless you had less granular damaged "states" (like 25%, 50%, 75% damaged versions)

OvermindDL1 commented 7 years ago

All infinity of them?

@ReikaKalseki Yeah, having separate "damaged" model instances probably wouldn't work for the turbines, unless you had less granular damaged "states" (like 25%, 50%, 75% damaged versions)

Exactly this, a set of pre-damaged models with special textures can be a great effect without killing the GPU. ^.^

raelik commented 7 years ago

All infinity of them?

@ReikaKalseki Yeah, having separate "damaged" model instances probably wouldn't work for the turbines, unless you had less granular damaged "states" (like 25%, 50%, 75% damaged versions)

Exactly this, a set of pre-damaged models with special textures can be a great effect without killing the GPU. ^.^

In the case of the ReactorCraft Turbines, blades get skipped during rendering as it gets more damaged. If he wanted to keep that (and use a more optimized renderer), he'd have to have entirely different models, as they would have fewer and fewer vertices the more damaged they became.

OvermindDL1 commented 7 years ago

In the case of the ReactorCraft Turbines, blades get skipped during rendering as it gets more damaged. If he wanted to keep that (and use a more optimized renderer), he'd have to have entirely different models, as they would have fewer and fewer vertices the more damaged they became.

I'd say make, oh, 16 variants of the model. 16 states is probably much more than enough states to tell the user "This thing is about to blow-up". Even 8 or 4 might be more than enough.

raelik commented 7 years ago

It's still a lot, each stage of the turbine is actually its own model (as it spins up, each stage is rotating at different speeds, until it maxes out), so it would be however many states times the number of stages (which I think is 7, so it would be 28 different models for if there were only 4 states). That said, this only applies to the standard turbines. The high pressure turbine (ModelBigTurbine) can't be damaged in that fashion.

raelik commented 7 years ago

Also, the LODModelPart is still needed, as that is where the "LoD" (level of detail) calculations are done, to determine (based on the client chunk render distance vs. the volume of the part) whether or not to even render the model. This is also where I got the idea that ModelRenderer only allows one call to addBox. It's LODModelPart that does that, to avoid overly complicating the volume calculation. That said, a new LODModelPart based on a matrix-based renderer would likely need a new method for calculating said distance.

ReikaKalseki commented 7 years ago

unless you had less granular damaged "states" (like 25%, 50%, 75% damaged versions)

Which is not acceptable, the same way that ten "fill states" for a reservoir would not be.

raelik commented 7 years ago

Which is not acceptable, the same way that ten "fill states" for a reservoir would not be.

We just meant the visuals associated with the damage. The actual damage is what it is.

OvermindDL1 commented 7 years ago

We just meant the visuals associated with the damage. The actual damage is what it is.

This, purely rendering state, not internal state. Like how Doom had a liferange of 0-200 for the player but his portrait only changed in 6 increments over that range.

ReikaKalseki commented 7 years ago

We just meant the visuals associated with the damage. The actual damage is what it is.

This, purely rendering state, not internal state. Like how Doom had a liferange of 0-200 for the player but his portrait only changed in 6 increments over that range.

I know that. Visuals must match the internal values as closely as is realistically possible.

raelik commented 7 years ago

Agreed, but sometimes you gotta be willing to compromise a bit when it has a major performance impact, which this definitely does. That said, I don't think it's that big of a compromise, you'd be giving up some granularity that honestly most people wouldn't even notice, and the amount of granularity is entirely your call. The tradeoff is memory, as each different turbine stage would have to have a model built for each possible level of damage. You'd do this programmatically, akin to what you're doing during rendering currently, but you'd store those models, and swap them in and out as the turbine took damage, instead of just dynamically rendering them with different numbers of blades based on the current level of damage. There is almost certainly a level of granularity there where you'd be the only one who could tell the difference between the dynamic damage rendering and a more static version, that doesn't require an absurd amount of memory.

1Matthias commented 7 years ago

I do have to say as an average user who has damaged a LOT of turbines, I've never noticed that there were an infinite number of visuals...one or two, maybe. I was more concerned with "Damnit, it's broken again." ;P

OvermindDL1 commented 7 years ago

I do have to say as an average user who has damaged a LOT of turbines, I've never noticed that there were an infinite number of visuals...one or two, maybe. I was more concerned with "Damnit, it's broken again." ;P

I would love to reach that point, but the rendering in RotaryCraft turns my computer (6 native cores, 3.6ghz, 16 gigs ram, and an AMD Radeon 5770, not at all low end, bit old, but not low end at all) into a slide-show in very short order. If you are curious I can record it (my cores are practically idle while the game waits on the huge amount of OpenGL commands to go through) with OGL benchmarking if I get time this weekend. ;-)

raelik commented 7 years ago

This whole discussion is moot if @ReikaKalseki is working on a 1.10+ version of his mods, since those would use the newer rendering system. I didn't feel like broaching that subject earlier though, I'm sure he hears it enough.

OvermindDL1 commented 7 years ago

Ogod... o.O

I decided to load my old world while in an OpenGL tracer.

Averaged about 1800-2200 API calls per frame to hit 60fps in my main base, good so far.

Hopped on over to the empty dimension that just holds Rotarycraft stuff (was going to be more later as well but it got too laggy, I moved it out of my base there hoping it would be fast enough to play with, as well as get my wife from yelling at me about how laggy my main base area was when she visited from her place), and yuuuuuup, averaging between 47000 to 52000 api calls per frame. And I only had the basics of machines making steam and making torque to power grinders to make seed oil and some other stuff.

Well, I managed to get about 2.1gigs of gl calls in the span of a few seconds (that is a new record for me). I can play back this trace to get the precise information that my video card had at the time (literally replaying the commands to the GPU, the massive bulk of calls are apparently displaylist render calls of a vertex array of 23 points, 12 faces, so a box in other words, and surrounding these are state flushing calls of unholy-hell-of-amounts of glPushMatrix, glTranslatef (multiple times, wtf...), glRotatef, in various combinations in different calls. 80 textures were bound in memory at that point (entirely normal, that is the spritesheets and various full-image textures), no shaders were loaded at all (FFP was in use), all 4 dynamic texture sheets were disabled (MC doesn't use them), stencil and various other buffers were empty (again, MC doesn't use them), shading model is GL_FLAT (MC does it's own 'fake' shading), and so forth, everything seemed normal in the GPU and it was running as perfectly fast as it could, it just had a sheer amount of calls that was orders of magnitude higher that what should ever be sent in a single frame...

On a side note, wtf is MC doing with the depth buffer... MC's original programmers are irritating... (not a speed issue, just a capability issue, blehg)...

I have to say, this is impressive Reika: I've never in my life seen any frame have so many API calls at a single point, and this is trivial stuff compared to what @raelik must be experiencing (I rather wish I could try out his world...). The 40megs of calls is utterly tiny compared to the call count, most of it is just state changes like push matrices and such.

The image loaded in the active texture unit on the GPU during the great bulk of these huge amount of displaylist calls is: Which definitely seems like a rotarycraft texture (considering not much else to render in a void world but vanilla blocks as the base building and the GUI).

So yes, I would highly recommend reducing the count of render calls, substantially. ^.^;

This whole discussion is moot if @ReikaKalseki is working on a 1.10+ version of his mods, since those would use the newer rendering system. I didn't feel like broaching that subject earlier though, I'm sure he hears it enough.

Ugh, 1.10 made so many missteps... I'd not recommend it, you'd practically have to rewrite the mods; at the very least all of the current rendering code is useless, just to start.

All things considered, if the mods were to be 'updated' in such a way, I'd say use a different/better engine, like Terasology or so. I love screwing around in that one. I'd probably even release something if I had someone do the pretty bits (textures/models/etc...) for me as I don't have a single artist bone in my body. ^.^;

ReikaKalseki commented 7 years ago

That texture is the grinder texture, though rotated 180:

http://i.imgur.com/8Wxepyj.png

Also, I am not at all familiar with GPUs or OpenGL backend; I send it instructions and it does them.

raelik commented 7 years ago

Ugh, 1.10 made so many missteps... I'd not recommend it, you'd practically have to rewrite the mods; at the very least all of the current rendering code is useless, just to start.

All things considered, if the mods were to be 'updated' in such a way, I'd say use a different/better engine, like Terasology or so. I love screwing around in that one. I'd probably even release something if I had someone do the pretty bits (textures/models/etc...) for me as I don't have a single artist bone in my body. ^.^;

@OvermindDL1 Agreed, I only brought it up on the off-chance that @ReikaKalseki had already been working on it. I highly doubted it, and I don't blame him, so we can just forget I mentioned it :wink:

I think Terasology IS cool, but a bit out-of-scope for what we're talking about. Given that these models were mostly generated with Techne, and Reika obviously isn't going to want to rewrite his models, is there a way we could build a ModelRenderer replacement that could work as a "quasi" drop-in replacement? The idea being that internally it works on a vertex-based polygon model, but that it still accepts ModelRenderer addBox() calls to build up the model using boxes. To bridge the gap between the "old" method of doing GL transforms during the render stage and the "new" method of baking the transforms into the model by giving it precise vertices, could this new renderer have glTransformd/glRotatef compatible functions for applying those transforms to the "boxes" in the model? This would allow it to remain MOSTLY compatible with the existing ModelRenderer based code. The main changes that would need to be made would be moving the transforms that are currently being done in rendering to be done during initialization, and to refactor those glTransformd/glRotatef calls into the new functions.

The point of all of the above is to be able to give Reika a PR for these changes that he can actually digest and build on if he wants to apply it to other models. Tackling the big offender, the ReactorCraft turbines, would be the best approach, as it is probably the most complex model with the most GL calls.

ReikaKalseki / Reika_Mods_Issues

FPS drop when looking at High Pressure Turbines and/or Turbine Generators #1512