Implement geometry shader

MRtrix3 / mrtrix3

MRtrix3 provides a set of tools to perform various advanced diffusion MRI analyses, including constrained spherical deconvolution (CSD), probabilistic tractography, track-density imaging, and apparent fibre density

http://www.mrtrix.org

Mozilla Public License 2.0

287 stars 178 forks source link

Implement geometry shader #91

Closed Lestropie closed 9 years ago

Lestropie commented 10 years ago

Needed for streamline thickness > 1 on OpenGL core profile, and will simplify a number of currently clunky parts of the code (streamline tangent-based colouring & lighting, and similarly for the fixel/vector plotting tool). Should also allow more fancy rendering of tracks such as GPU-based upsampling and streamtube rendering.

jdtournier commented 9 years ago

Just thought I'd revisit this issue, and jot down my thoughts on this before hippocampal atrophy sets in. It really would be good to have a look into this, there are a number of operations that this would open up - and it would simplify the code, and potentially speed up the rendering too (the multiple vertex array approach taken currently works OK, but probably wasn't designed for the purpose of computing tangents...).

In terms of code, it would involve minor additions to the shader classes to allow an optional geometry shader to be linked into the program. And then the major hurdle would be to figure out how to turn a line segment into a triangle strip...

I don't think it would be worth trying to go beyond 2 triangles per line segment (the minimum), since that already allows us to do everything we need (including streamtube rendering - or at least a decent approximation to it, more on that later). The trick is to generate vertices that will mesh nicely between segments even when the segments are turning sharply, such that the width of the line remains constant from the viewpoint. There's a bit of geometry to figure out, but nothing too difficult.

Back to streamtube rendering: what we can do is generate normals for each vertex that point away from the middle of the line, perpendicular to the line of sight. These will be interpolated once they get to the fragment shader, and all we'll need to do is update by adding a component normal to the tangent and the in-plane direction (i.e. across the segment), scaled by sqrt(1-x_x-y_y) - that direction would probably be computed in the geometry or vertex shader. That should basically give near-perfect streamtube-like rendering, with the only approximation being that the position of a strongly angled line (with respect to the viewpoint) will look 'flat' - it'll be sitting on a flat line segment, even if the shading makes it look cylindrical. I don't think that's going to be a problem given that we'll rarely be displaying these at large magnifications, but if it proves too ugly, we can always generate more vertices from the geometry shader to refine that aspect.

There is also the option of doing some form of cubic interpolation on the streamlines using the geometry shader, but honestly that's probably fully unnecessary... At least at this stage.

We should also bear in mind that this a geometry shader will also help with the vector plot tool, although the operations required in this case are much simpler. This might actually be the easiest place to start and get a handle on how geometry shaders work in the first place...

rtabbara commented 9 years ago

Thanks Donald for the added info. This is something I would definitely be interested in tackling.

Are people averse to having these new shaders reside in separate files rather than provided as strings within the executable? Aside from being cleaner, there's a variety of shader IDE/tools out there that can help with testing/debugging. Given that the geometry shader will be performing a lot more computation, I definitely see a need for being able to debug the shader in isolation.

While some of the existing shaders are injected dynamically with uniforms/functions based on some current state (e.g. see Fixel), we could achieve the same thing by using GLSL subroutines (OpenGL 4>). So we could still have dynamic behaviour within our shaders even if they were in a separate file, and avoid the current process of specifying a shader via a sequence of string concatenations.

The only downside is that our geometry shader would not be bundled within the executable. But given that most users are accessing mrtrix via the source, I wouldn't think this is a huge downside?

jdtournier commented 9 years ago

Good to hear, that would be a great help.

I'm not necessarily averse to the shaders being in separate files, but I do have a couple of concerns with that. The reason I went down the road of dynamically-generated shaders was mostly about performance, but I'm not sure that's a particularly good reason to carry on with it, I have a feeling the impact of a few if statements within the shader are probably not going to be felt, raw data throughput is likely to be the overwhelming bottleneck in most cases.

Another reason was as you say to keep the shaders bundled within the executable. As soon as they reside outside the application, they need to be located at runtime, and depending on how that's done this can be another complication for installation & management. For instance, I absolutely loathe Qt's decision to hard-code the location of their plugin & platform folders into the library, forcing a decision about where it will eventually be installed to be made at configure time, and making it really hard to distribute relocatable binaries. This bearing in mind that in many cases I expect users will install MRtrix in their home account (particularly when installing on a HPC cluster for instance). If you can think of a way of making the location of the shaders discoverable by the application at runtime without any particular configuration, then by all means. For instance, the library search path for MRtrix is now inserted into the executables as a relative path, so the whole installation can be moved elsewhere without affecting its operation - maybe this can be done also for these files. But as an added complication, whatever strategy is used would have to work on all supported platforms, including Windows...

Bear in mind though that switching to that approach would be a pretty major undertaking, there's a lot of code that would need to be refactored. For instance, all the colour-mapping code is currently handled by dynamic recompilation of the shader - you'd need to have each shader include the code for all of the different colour-maps in there, which may or may not be a problem - not sure. And while you could use the GLSL subroutines to handle that, I'm not keen on requiring users to run OpenGL4 - more on that below. So while I can see that it might make things a little easier for debugging initially, I'm not sure it's the best strategy long-term. Happy to discuss obviously...

As to using OpenGL4 features in MRtrix, I'm pretty firmly against the idea. That would seriously restrict the number of systems that would be capable of running MRView. OpenGL3.3 is already a pretty big jump for a lot of people still running ancient 'stable' Debian or CentOS distros... It's also the highest version supported on many laptops - which are obviously not going to be adequate to run the more intense stuff, but that shouldn't stop them from being able to use MRView for more lightweight work. For one, most macbooks only support OpenGL3.3 (as of Mavericks). On top of that, Mesa has only just pushed out OpenGL3.3 support across most of its open-source drivers (as of version 10), and it'll be a while yet before its OpenGL 4 support catches up. Basically, whenever I've thought about using OpenGL4 features, I've always ended deciding that it wouldn't be a good idea for adoption. So if we can get what we want done using OpenGL3.3, I vote we stick with it - and reconsider only if and when we genuinely need a feature that can't be done without moving to OpenGL4...

Lestropie commented 9 years ago

I'd wanted to tackle this myself at some point, but I'll be bogged down with fellowship stuff for at least a couple of weeks, so feel free to go for it.

I'd recommend starting with the fixel tool also; it uses a similar hack as is used for streamline rendering, but it makes less sense in that context. Having individual vectors of voxel centres / fixel directions / amplitudes / associated values would make the code a lot more sensible - and open up the possibility of having more complex geometry for each streamline e.g. cylinders, cones, pseudo-FODs, etc.. We can then start experimenting with how to deal with drawing streamlines.

Agree we shouldn't be too concerned about GPU streamline upscaling right now, since this can already be done before visualisation in tckedit.

Donald's the better authority on OpenGL so I'll stay out of that one. But given I just had an issue myself trying to get mrview running on a laptop due to OpenGL 3.3 requirements, it might be a bit premature to be relying on 4.

rtabbara commented 9 years ago

@Lestropie Sorry Rob, didn't mean to steal your thunder :) I'm currently working on a few smaller mrtrix issues at the moment so it might be a little while yet before I get started on the geometry shader stuff. If you're free by then, I'm happy for you to take the reins. Otherwise, I can just periodically bother you for help.

@jdtournier Fair enough. Honestly, I was selfishly focusing more on the initial write-up/debugging phase, but as GL4 seems a little premature to adopt, moving to separate shader files would be pretty pointless for now.

jdtournier commented 9 years ago

Just thought I'd add: it seems drawing lines with triangle strips will actually result in potentially much better performance, as suggested by this StackOver post. Modern gaming cards are heavily optimised for this specific tasks, and not so good at drawing lines, it seems - or rather, the drivers are probably crippled for line drawing on gaming cards, so that workstation-grade cards (i.e. Quadro's) appear vastly superior for CAD applications, which make heavy use of wireframe rendering...

rtabbara commented 9 years ago

A first pass of this can be found in the geometry_shader branch. Please check it out and let me know what you think. There's a few issues that I came across

Geometry shaders don't actually excel at amplifying geometry as explained here. Nonetheless, I rejigged a few things in the shader pipeline so hopefully the performance should be comparable (hopefully better) to the original implementation. I've only tested on a Quadro so far, but as Donald mentioned that's not the best test setup, so I'll look to test it out on a different graphics card.
A primitive form of streamtubing can be toggled within the tractography tool. Basically, all it does is apply a linear gradient across the normal of a line segment. Even though it's simple, I found that, particularly when rendering a largish number of streamlines, that it produced quite a nice effect.
One problem is that the streamtube shading is dependent on the order of the vertex data. So for instance, if we have two horizontal lines that are drawn right-to-left/left-to-right then the shading will be inverted. But it's definitely possible to fix this up along with adding more sophisticated lighting effects (e.g. calculate surface normals based on cylindrical shape of tube). However, before diving into that I wanted to ask whether something like that would be considered overkill? The issue is that when rendering a large number of streamlines, will these nuances be something that a user can pick up on or will they have a negligible effect?

jdtournier commented 9 years ago

Nice work! Just had a go, it looks pretty good. We should definitely fix up the lighting, but for now the main thing to focus on is just getting it to work fast...

On that front, I am a bit worried about that first issue you mentioned. If geometry shaders do impact performance that badly, then maybe we should think of something else... It's fine on my system, but then I've got a very decent GeForce 780 GTX. Even then the framerate was definitely lower with the geometry shader...

So having thought about this a bit, I think there might be a way to do most of what we want to achieve without the geometry shader, although the approach is even more Heath Robinson than before...

The general aim here is to somehow turn out stream of vertices into a pair of vertices so we can render it as a GL_TRIANGLE_STRIP rather than a GL_LINE_STRIP. Ideally, we only want to store the vertices for the streamline, nothing more - otherwise RAM usage would double. What we were doing before to avoid storing the streamline tangent, was basically the following:

load the vertex data into a Vertex Buffer Object
create a Vertex Array Object
set as the first attribute the buffer object as-is
set as the second attribute the buffer object with its start position shifted one vertex back
set as the third attribute the buffer object with its start position shifted one vertex forward
render using GL_LINE_STRIP

In the vertex shader, we could then use the second and third attribute (corresponding to the previous and next vertex respectively) to compute the tangent...

I think we can use this same approach with one crucial difference: the stride of the Vertex Buffer Object is half what it would normally be (i.e. 6 bytes rather than the 12 otherwise required for 3 floats). If we render that as a GL_TRIANGLE_STRIP, every other point will be garbage, but the other points would be those of the original streamline. That means we can use the original Buffer Object essentially as-is (no additional RAM usage), and use some fancy vertex shader magic to sort out the mess...

In the vertex shader, we would still have access to the vertex before and after, so using the built-in gl_VertexID variable, we can check whether we are processing an odd or an even vertex and select the right vertex by reading it from the appropriate attribute (current vertex if even, previous vertex if odd). We can then perform the same operations as are currently done in the geometry shader to shift the vertex to one side of the streamline or the other, again depending on whether its index is even or odd.

To actually do this (and for colour-by-direction, and lighting), we need to compute the tangent. For this, we actually need the vertices two indices back and forward (since the adjacent ones will be garbage). So we need additional vertex attributes set for these too, and one additional one shifted 3 vertices back so we have access to the correct vertex when processing odd vertices. That's a total of 6 vertex attributes, all providing shifted views of the same buffer object...

I think that's feasible, and would avoid any need for the geometry shader and its limitations. What do you reckon? Does this make sense to you...?

rtabbara commented 9 years ago

I roughly understand this idea, but it's not clear to me how you would get a line width while still only using the original vertex buffer. Even when this stride adjustment, you still ultimately need to output a minimum of 2 vertices per incoming vertex to produce a quad and that's something the vertex shader can't do. In other words, if I have N vertices then we can only hope to ever get N-2 triangles using GL_TRIANGLE_STRIP, whereas we require a minimum of 2(N-1) triangles.

The only way I could see this happening is that you would supply a corresponding index buffer that doubles up on the vertex indices. i.e. 0,0,1,1,2,2... and then within the VS you could do something similar as you described where you're ping ponging from the top to bottom edge.

But even if this was sorted, the main issue I came across was the issue of nicely handling sharp corners while still maintaining the same line thickness across the entire streamline. Playing around, I found that solely using 4 vertices per line segment produces very jarring, pointy corners at sharp angles that, in particular, become quite noticeable from afar. In the end, I found that I needed to resort to 6 vertices (two quads) to produce consistently smooth lines. Basically, there's an additional gluing quad between adjacent segments.

This becomes problematic when attempting a pure VS approach. Whereas with the geom. shader there's a map between a line and the 6 outputted vertices, with the VS, there's no longer a clear mapping between input and output vertices.

With that in mind I took another stab at salvaging the geom. shader. While a GS isn't good for amplifying vertices, one benefit it does have over a VS is the ability to choose to output no primitives giving us GPU-based culling. Already, I had incorporated this feature for when a user chooses to crop a slab, so that we are not unnecessarily generating quads for vertices that aren't part of the slab.

But I also realised I could extend this approach to perform frustum culling, so that additionally lines that aren't visible in the viewport will also not be promoted to quads. So in particular this means that there should be a performance boost when zooming in to a particular region.

Another speed up attempt was to reduce the overall number of triangles generated by dynamically choosing to output 4 or 6 vertices for when we have near parallel or near orthogonal line segments resp. I was banking on the fact that hopefully there a relatively fewer sharp turning streamline segments, but I'm not sure whether in general that's what's observed in practice.

Overall, after the latest changes I did notice a speed boost, but haven't really profiled it yet to quantify the improvement.

At this point, I'm not sure whether I'm just scraping the bottom of the barrel..but at least we have streamtubes!

jdtournier commented 9 years ago

@rtabbara - I wouldn't exactly call this scraping the bottom of the barrel - it works fine, and that's the main thing. Fast enough on my system for smooth rendering of 200K streamtubes...

That said, I wouldn't worry too much about trying to speed things up with culling. It's actually relatively rare to zoom in - or rather, it's sufficiently common to display the tracks without crop to slab and full field of view that we need to focus on that use case. If it's not fast enough for that, we need to rethink the whole approach.

Coming back to this idea of using the vertex shader, I appreciate it's not the simplest idea to get your head around... Basically, the idea is to trick the system into thinking it's got twice as many vertices by halving the stride passed when you attach the attribute. So instead of saying "here is a buffer of 3 (4-byte) floats with stride 12 bytes", you say "here is a buffer of 3 (4-byte) floats with stride 6 bytes", and then draw twice as many vertices in your glDraw call. It makes no sense at face value, every other vec3 read this way will clearly be misaligned and hence total nonsense. But crucially there is now no need for the vertex shader to emit any more vertices, we are already calling glDraw with the full number we'll need for a complete GL_TRIANGLE_STRIP.

The problem is how to deal with the fact that the data will be unusable half the time the vertex shader is invoked. The idea is you can figure out in the vertex shader that this vertex is not to be used by looking at the gl_VertexID variable - if it's odd, it'll be useless. So we can attach a second attribute, also with stride 6 bytes, but with its start offset shifted 6 bytes backwards compared to the first. So when the vertex shader processes a vertex with odd gl_VertexID, we read the position from that attribute, and when it's even we read it from the first attribute. In this way, the vertex shader gets to process the same vertex twice in succession, and we offset one of these to one side of the line, and the other to the other side of the line (again, using the gl_VertexID to decide which way to shift the vertex). Effectively, once you've figured out which attribute your actual vertex is to be fetched from, you can use the same code that you've already got in your geometry shader.

The next trick to make this work is to extend this to fetch the adjacent voxels so that you can compute the tangent and everything else that depends on it. Basically, we use the same idea, we just need more of the same attributes, shifted by different amounts so that we can always get hold of the vertex behind and the one in front. We'd need another 4 attributes for this to handle the odd and even cases.

So far so good, I'm pretty confident this can be done - in fact, that's how the current code works (without the half-stride trick, mind you). What's more problematic is this issue of sharp corners that you mentioned. I can totally see this being a problem, and was also thinking we might need to soften those corners with additional quads, as you've done. This can probably also be done using the vertex shader alone, although it would require a little additional tweaking. So to get around this, we'd need to process 4 vertices per streamline vertex, rather than 2.

Basically, to fool the vertex shader into processing 4 times as many vertices as there are in reality, we can use the exact same approach again, but this time with a stride of 3 bytes. However, we would now need 12 attributes to be able to access the 3 vertices we're interested in, having figured out which attributes to use based on gl_VertexID mod 4. This is entirely possible - messy, but not that bad. The problem is that OpenGL requires implementations to provide a minimum of 8 attributes (GL_MAX_VERTEX_ATTRIBS), so we'd be relying on implementations going beyond the minimum spec. I don't expect it would be unusual for implementations to provide more than the minimum - they routinely do. But we would definitely run the risk that this would not work on some OpenGL3.3 compliant hardware, which would be a less than ideal outcome. I don't think we should go down that road...

Anyway, hopefully that'll clarify what I was on about. If not, maybe you can have a chat with @draffelt...? We set this up together initially, he might be able to make sense of my ramblings...

In the meantime, I'll see if I can think of some way to make all this somehow hang together without a geometry shader. Until then, I think we stick with the geometry shader implementation, and get the lighting, etc, right. Most of the work for this should be done in the fragment shader anyway, so it can be reused regardless of how we eventually decide to do this.

jdtournier commented 9 years ago

Actually, now that I think about it, we could use indexed rendering, as you suggest, with indices set to [ 0 0 0 0 1 1 1 1 2 2 2 2 ... ], and using the gl_VertexID to decide which vertex (of the four needed to specify the two quads in the triangle strip) we are dealing with. We can also use the current shifted attribute trick to access the previous/next vertices to get the tangent. And we can use a single array of indices for all tracks, just by making sure it's long enough for all the tracks loaded, and using a different count for each (as we currently have to do in the glMultiDrawArrays() call anyway). I reckon this could work... What do you reckon?

rtabbara commented 9 years ago

@jdtournier Think I've fixed up the lighting now so please check it out.

I've tentatively combined streamtubing and lighting as the one option (lighting) within the tool bar. The motivation for this was that streamtubing requires lighting to give the stream lines depth. In particular, the surface normals used in the lighting calculation are rotated along the tangent vector to give the appearance of being cylindrical.

Not sure whether there's a desire to have flat looking streamlines and lighting. In practice, I observed that the original lighting implementation tended to produce hard edges when using a moderate amount of specular intensity.

jdtournier commented 9 years ago

@rtabbara Looks good! Couldn't resist having a bit of a play:

streamtubes

Otherwise, just a few comments:

I've tentatively combined streamtubing and lighting as the one option (lighting) within the tool bar.

Perfectly happy with that, sounds like the most logical approach.

In practice, I observed that the original lighting implementation tended to produce hard edges when using a moderate amount of specular intensity.

Yes, it was a bit a of kludge...

On a different note:

it's not sufficient to discard primitives in the geometry shader alone, you'll still need to do some culling in the fragment shader. Otherwise you get this jerky behaviour as you move the slab, as line segments suddenly appear/disappear in one go. This is especially problematic for large step sizes / small slab widths. In fact, in extreme cases you end up discarding a lot of line segments that should be displayed, since their vertices may end up either side of the slab. You can see this problem very clearly when displaying tracks generated using tckgen with iFOD2 (the default), which by default uses a relatively large step size.
There are a few issues with the lighting, the normals seem to pointing away from the viewer. If you open the dialog box for the lighting setting you'll see that the tracks are better lit with the azimuth angle shifted all the way, which corresponds to the light being behind the object. Also, there's an issue with the diffuse lighting, it somehow seems to darken the tracks, rather than add to them. When displaying with full ambient, no diffuse or specular, you get essentially the same as no lighting. If you now push up the diffuse coefficient, the tracks get darker. This is not what should happen, they should add linearly, and eventually saturate. I have a feeling this relates to the surface normals, since things seem to behave better when the light is shifted to the back. Maybe you're not clamping the light intensity values, and adding negative values as a result?

Basically it just needs a bit of polish, the basic infrastructure clearly works well. Could I ask you to send me a diagram or two explaining how the calculations are done, in terms of generating the mesh from the streamline vertices, and also in terms of the lighting (mainly surface normals, I guess)? It would really help me get my head around the code... Just a scan of a hand-drawn diagram would do, if you have the time.

rtabbara commented 9 years ago

@jdtournier Thanks for pointing out the lighting issue - think it's sorted out now. I've also moved the cropping to slab logic back to the fragment shader so you should get smoother cut-offs as you toggle the width.

I thought the next step would be to try and get performance up by trying to remove the geom. shader and instead use indexed rendering. After much head scratching/tearing, I realised that this isn't work. The problem is that when using an index buffer, gl_VertexID corresponds to the index value not the sequence order. So if in my index buffer we have 0, 0, 0, 0, 0, 0,...then gl_VertexID will be 0 in each case and not 0, 1 , 2, 3, 4, 5. Not only that, but I believe when indexing the vertex shader will only be invoked once per vertex and results cached on the GPU which kills the possibility of having different behaviours within the VS for the same vertex.

On the plus side, I think I've come up with a solution for handling streamline transparency. I'll describe more of that in #177.

jdtournier commented 9 years ago

The problem is that when using an index buffer, gl_VertexID corresponds to the index value not the sequence order.

You're right, this isn't going to work...

I believe when indexing the vertex shader will only be invoked once per vertex and results cached on the GPU which kills the possibility of having different behaviours within the VS for the same vertex.

Yes, you'd probably end up having to compute the same thing several times, which isn't ideal...

My guess is you're right, it'll probably be simpler to stick with the geometry shader...

In the meantime, I've just given the update a shot, it looks good - although there's still an issue with the specular highlights, they don't behave as they should. I'll investigate...

jdtournier commented 9 years ago

OK, just had a shot at sorting out the specular lighting. I ended up recoding up the surface normal calculation, I couldn't quite figure out how your version worked... I think it works as expected now, see what you think...

jdtournier commented 9 years ago

@rtabbara : I reckon this is close to ready to merge. I think all we need now is to polish a few things:

it would be good to set the streamline width in millimeters rather than pixels. At the moment, I notice it's relative to the window size, and doesn't scale up when zooming in. Maybe we could set the width to a sane default initially, say half the step size of the first track that gets loaded in. We could replace the linewidth slider with an AdjustButton and give the actual width in millimeters. Do you think that would work?
I also want to set the default lighting parameters to something that works a bit better for the streamlines. I'll give that a shot when I have a minute.

Guess we're almost there, hey? We can always improve on this later if we come up with something better...

rtabbara commented 9 years ago

@jdtournier

it would be good to set the streamline width in millimeters rather than pixels.

Ok, had a fair crack at getting this to work but ran into some overdraw issues. The issue is that the distance between successive points is fairly narrow, and so you quickly run into the problem of the thickness width being greater than the line length as seen in my crude picture below,

path4038-6

where the dashed lines represent the segment thickness. Given the constraint that we want the line thickness to be consistent across the streamline, I'm not really sure how we could accommodate for this.

The benefit of using the streamline width in terms of pixels instead of mm is that when zoomed out, these overdraw issues by and large are not apparent, allowing you to render fairly thick lines from afar. However, as we zoom in, as the distance between two successive points in screen space increases but the line thickness remains fixed, eventually thickness < line length meaning that streamlines will be rendered nicely.

It could be I've missed something in terms of creating the mesh, but I didn't want to bang my head on this for too long.

I also want to set the default lighting parameters to something that works a bit better for the streamlines

I played around a bit with this and think I came up with something that looks ok, but please feel free to have a tinker.

Just going to polish up the geom. shader code, and if I get your ok hopefully we can merge soon.

jdtournier commented 9 years ago

Yes, I did notice that if the streamline width is too large, it looks pretty ugly when zoomed in. I was thinking that the default streamline width would be set to something sensible for the tractogram loaded, maybe half its step size for example. I guess the issue then is that it might look too thin from a distance - with iFOD1 defaults, that would be a 0.1mm thickness... No so bad for iFOD2, would be 0.5mm or so.

So if that looks silly, I guess we could stick to the current setup. But I'd rather streamline width was relative to world coordinates if at all possible. Could we implement a crude subsampling approach when we detect the streamline width exceeds the step size? All it would take is changing the strides and offsets passed to the VertexAttrib calls, I think...? Might also speed up rendering in that case, which would be a bonus... And we could then set the default streamline width relative to the size of the image displayed rather than the tractogram, so that the width would look OK regardless of the step size used in the tractogram. Worth considering...?

rtabbara commented 9 years ago

I guess the issue then is that it might look too thin from a distance

Yep, exactly the problem I encountered.

Could we implement a crude subsampling approach when we detect the streamline width exceeds the step size?

The problem I see with subsampling is that you're going to get some pretty jumpy behaviour as you alter the line thickness. If you imagine 3 points forming a sharp cornered segment, then as the line thickness increases, we potentially will have a situation where we skip the middle vertex, causing a dramatic change in the direction and shape of the rendered quads.

Worse still, by eliminating points you're removing the guarantee that the distance between any two consecutive remaining points will be equal to the step size (e.g. think of 3 points forming a narrow spike) and so you can still wind up with the same issues.

FWIW, I agree that having a line thickness relative to world coordinates would be preferable, so please don't think I'm trying to purposely sabotage the idea.

jdtournier commented 9 years ago

OK, you're probably right, let's leave things as-is for now.

please don't think I'm trying to purposely sabotage the idea.

Hang on a minute... ?!?

Kidding aside, it would probably be a good idea to run this on a less-than-optimal setup - just to make sure this change doesn't absolutely kill performance on non-stellar hardware... Would you be able to compile and run this branch and compare to master on a system with a run-of-the-mill graphics card?

draffelt commented 9 years ago

@jdtournier what would you consider run of the mill? Rami currently is using a stock Quadro card.

jdtournier commented 9 years ago

I guess that would probably qualify as run of the mill... What's the exact model again...?

jdtournier commented 9 years ago

By the way, I just checked it out on my laptop (Nvidia GeForce G 105M), performance is definitely a lot worse with the geometry shader, but it's still acceptable with 100K iFOD2 streamlines. So I think it's good enough, let's merge.

Lestropie commented 9 years ago

Bit concerned that there's no capacity at all to render without the geometry shader; I'd expected it to revert to a standard line draw if the thickness was set to the minimum. May be highly restrictive in cases where tckediting your data is not an option as you need to be able to see the whole lot...

jdtournier commented 9 years ago

It would take a lot of effort to replicate the whole shader pipeline for this specific case. Note that it won't ever prevent you from displaying - the amount of RAM required is strictly identical, and geometry shaders are core since OpenGL 3.2. It'll be slower than the standard line draw, but still usable.

rtabbara commented 9 years ago

Ok, I've merged this. Thanks everyone (esp. @jdtournier) for all the help/suggestions and enduring with the long discussions. Hopefully the streamtube-enthusiasts at ISMRM will be drooling...or not.

jdtournier commented 9 years ago

Good work! Thanks for all your efforts on this.

I was just thinking: rendering 100,000 streamlines at ~100 vertices per streamline, and 4 triangles per vertex, comes out at ~40,000,000 triangles per frame. I'm not sure what the theoretical maximum is on my system, but I reckon we can't be that far off... It's not unlikely that this is actually the best we can do for streamline/tube rendering (at least for non-unit streamline widths).

Anyway, let's close this off for now, and get on with our lives...

rtabbara commented 9 years ago

@jdtournier Wasn't too happy with the rendering performance, so in 117faee3fa, I've tried to speed things up a bit. We now have:

Discarding primitives in the geom. shader when outside the cropped slab. This only discards when both end-points of the line are outside the region. Moreover, the fragment shader will still also perform further cropping so tracks will smoothly grow/shrink as we alter the slab width.
Discarding primitives when outside the viewport. Even though zooming in may not be a common occurrence, testing for this is a relatively low cost vs. the potential benefit of avoiding to create additional primitives in the geom. shader which is the cause of the bottleneck.
Downscaling of tracks. Within a FOV range (currently set to > 50), we skip odd numbered vertices when creating our primitives. I know I was sceptical about this, but I found that zooming in/out did not create any significant jarring when transitioning from a downscaled to the original vertex set, particular when rendering a large number of tracks. Also, note for track files with a track count below a threshold (currently set to 5000) no downscaling is performed regardless of the FOV.

Hopefully, a significant speed improvement should be observed, particularly when cropping.

jdtournier commented 9 years ago

@rtabbara, yes, I reckon a bit of sub-sampling is a good idea. That said, I'd rather the criterion for deciding when to sub-sample was relative to the data, rather than a hard-coded value. Otherwise we'll cause all sorts of problems when people import their mouse data...

So here's a few options that I think would be better:

sub-sample when streamline width > step size (or some multiple of it). Won't trigger sub-sampling when zoomed out necessarily, but will reduce 'jaggies' when using a large streamline width - which tends to happen more when zoomed out anyway.
sub-sample when the step size is small compared to the FoV - or better yet, to the screen pixel size. Should be relatively trivial to detect, and should tick all the boxes.

Personally, I reckon we can use both... There's no point rendering all vertices of a streamline whose step size is smaller than its width, or when the screen resolution doesn't allow you to distinguish between them anyway - and this is irrespective of performance.

Otherwise, I'd like to revisit the discussion we had a while back about how the streamline width is set. Currently, it scales with the window size, but not the FOV. This means the streamlines appear to get 'thinner' relative to the anatomy as I zoom in, when the volume that we've assigned to them using the streamtube rendering gives them a 'presence' that suggests they should scale accordingly. This is also a problem when trying to render thin streamlines on a very large screen (if I spread MRView over both my monitors, the minimum streamline width ends up much larger than a pixel - before you ask, I did that recently to take a high-res screenshot).

Both of these issues would be better handled by ensuring the line thickness is relative to the FOV only - you could even give it a physical value in millimetres, and use an AdjustButton to modify it. I remember your objection at the time was that the render looks bad when the streamline width is very small, which is true, but only for lit streamtubes with non-zero specular lighting - it looks just fine otherwise. Given that this is a corner case (albeit probably a fairly common one), I really don't see that we can justify not scaling properly on that basis. The proper way to handle the residual imperfections in the lit version would be using multi-sampling / anti-aliasing techniques (unfortunately they'd be pretty difficult to implement correctly for the volume render). Besides, with the current approach we can display streamlines with a tiny width anyway, simply by displaying them within a small window...

Finally, I've got a few ideas for how to render these streamlines with fewer vertices, in a way that might even allow bypassing the geometry shader - but I need to nut it out in my head first. Maybe we can revisit this with a face to face Skype call or something when the dust settles after the ISMRM...?