Duneuro Optimization 1 : input files for duneuro

brainstorm-tools / bst-duneuro

Brainstorm-Duneuro integration

GNU General Public License v3.0

5 stars 2 forks source link

Duneuro Optimization 1 : input files for duneuro #4

Open tmedani opened 4 years ago

tmedani commented 4 years ago

Hello,

This is in the todo list and we can discuss it here, low priority but important.

right now, the input for duneuro are ASCII files, From brainstorm, in the case of the isotropic model, we use the .msh file for the geometry and .cond for the conductivity and in the case of the anisotropy, we use the Cauchy files (.knw and .geo).

The w/r of these files takes ~10 to 30 min in the case of the millions of nodes and elements, which is around ~ 5% to ~20% of the computation time.

This is working, but we can improve by using binary input files.

This will need change on the duneuro c++ code (@juangpc ;) nice challenge)

Before that, we need to discuss which format to use as a standard between us and the duneuro team.

In SimBio the vista file format is used, https://www.mrt.uni-jena.de/simbio/index.php?title=File_Formats

Best Takfarinas

martcous commented 4 years ago

Thank you Takfa, I am a bit of an outsider on this project: is there a technical specification for the ASCII duneuro input data files and the Vista format? The link you provided seems to only describe the Simbio .par parameter file.

If there are open formats used in other software such as Vista in Simbio I agree it would be best if this can be adopted, perhaps there are even open source IO functions we could reuse either in Matlab or C++.

tmedani commented 4 years ago

Hi @martcous,

Thanks for your feedback.

Indeed there is no much online documentation about the file format that we are using and even on the vista file.

I found some of these files and few pdf files that explain some part here https://sourceforge.net/projects/neurofem-simbio/files/

I will check with the German group if there are any other links for documentation.

Furthermore, I have checked about the vista files and indeed there is already some function in cpp and mex already https://github.com/fieldtrip/fieldtrip/tree/master/external/simbio

but I think we need to add them within the i/o functions of duneuro, somewhere here:

https://gitlab.dune-project.org/duneuro/duneuro/-/tree/master/duneuro/io

juangpc commented 4 years ago

Hi everybody. Few comments.

Let's do this.
Before I start. Few considerations.

-> We've used binary files so far in the project to avoid rounding errors while converting floating-point values back and forth. John pointed out this problem at an early stage and although you can use text files and make the rounding errors pretty much disappear if yo do things nicely, I agree that it is more elegant to use binary files and that's what we did. Nobody has pointed out that we're using binary files because of efficiency, just wanted to remark this for completion.

-> I think that if in one part of the project we're using binary files, we should try to use them in every other part, just for ease of mind, or perhaps, just to make it easier to remember how things work in the project. Additionally, I think that when some info is stored into a file that is not intended for human use, it really makes sense to use binary, especially when storing floating-point values. So again, let's do it.

-> When reading or writing info from or into a file you're tapping into some library functions first and pretty soon you'll hit an operating system api. The difference between r/w binary data or character data is not that relevant. The most important aspect is the bandwidth of the device you're using. Yes, sometimes if you're r/w big chunks of data you can increase efficiency but you can do that both with binary or with characters. So as an initial start, the increase of efficiency I don't think will come strictly from changing from text to binary files.

-> Having said that.... Matlab is an interpreted language. Mening excruciatingly slow in reading/writing operations. I haven't looked at the code yet, but I think tha it would make sense to try to write/read big blobs of data to increase efficiency and if really needed I can do a c-mex matlab function to read or write the data from a binary file and have it stored in Matlab's heap.

-> being that we're thinking of taping into c-mex api. I think it is, at least, worth conversation to somehow allow memory sharing between the different operations/functions in the duneuro process. Or somehow giving Matlab access to that program's mem range. This would allow to completely avoiding the write to file and read from file processes. This would substantially increase efficiency I'm talking like 10x to 100x increases. Typical hard drive bandwidth is between 100-500mb/s up to 3gb/s in a fancy one through pci port. However memcpy operations are 5-20gb/s.
In the past @ftadel has patiently listened to my ideas of using more c-mex api in bst. And rejected them. Yes, for reasons I understand and support. So let's keep talking.

Sorry. Wordy mondy.

ftadel commented 4 years ago

In Matlab, reading/writing floating points in text fields is indeed terribly slower than binary files (because of the conversion, a lack of optimization, and also simply because the files are much bigger). I've been facing this problem many time. Indeed, using a binary files instead would make it a lot faster, at least for the Matlab part. Adding more I/O file formats is usually not too complicated, with no risk for the rest of the API.

@juangpc How feasible would that be to write a C-mex caller for duneuro instead of going though a system call? (a mex-file that would receive the mesh as a Matlab matrix and return the leadfield as a Matlab matrix instead of saving it all to the hard drive)

In the case where: 1) this is something you can do in a limited amount of time 2) the recompilation of the mex-file is super well documented 3) the total expected time in gain is significant with respect to the processing time (it makes a significantly different experience from the user point of view) => then it could also be a good idea (but more invasive, possibly more unstable, with larger coding requirements and more complicated distribution/maintenance :)

juangpc commented 4 years ago

Great!

In Matlab, reading/writing floating points in text fields is indeed terribly slower than binary files (because of the conversion, a lack of optimization, and also simply because the files are much bigger). I've been facing this problem many time. Indeed, using a binary files instead would make it a lot faster, at least for the Matlab part. Adding more I/O file formats is usually not too complicated, with no risk for the rest of the API.

Aha. I didn't know that. Thank you.

How feasible would that be to write a C-mex caller for duneuro instead of going though a system call?

For the duneuro part, linux and darwin will work pretty much out of the box. windows... 😟 complicated.

Yes. I can do this. Well, I don't want to underestimate this. Anywhere between 1 day and a week. for next biweekly?.
I will do my best.
Let me start here. I can do an initial assessment.

tmedani commented 3 years ago

Hello all,

Nice to read all these relevant comments and learn these new things here :).

If I can add some comments, regarding the previous steps of this project, Duneuro offers a matlab (and python) interface with mex files while all the data are passed by memory (matlab matrix input and output) without using any input and output files.

http://duneuro.org/ ==.at the section documentation

However, this is working only on Linux ... with @juangpc we have spent a lot of time trying to generate the mex files for windows... at the end nothing :( ... with more persistence, we were able to generate the binaries :)

Furthermore, in order to generate these Mex files, which are related to the OS and the Matlab version, the users must compile the Duneuro code on their computer... which is not easy for most of the brainstorm users. Also, @juangpc did a super job that summarize the tedious steps of the compilation in only few lines here

The ideal case is to have cross-versions and cross-platforms mex files for the users without asking them to compile the duneuro core code... as it's the case for SimBio in fieldtrip.

I don't know how much work these needs and how long it will take.

Regarding the current version, with the binaries, the main change that we need is adding new binaries input files for duneuro, and of course, we need to define the file format, then write these files from matlab and add a cpp code to duneruo in order to read these files.

@juangpc has successfully added the code that writes the duneuro output as binaries and it was slightly faster than the text files.

Changing only the input files will not affect much the actual version of the brainstoem code.

ftadel commented 3 years ago

I'd vote against optimizations that would require 1) the users to compile something on their computers or 2) doesn't work on windows. It's probably better to have something slower but easier to deploy.

juangpc commented 3 years ago

Hi all, I've been able to work on this and finish an initial approach. Here is a benchmark for random data being saved with the original function that is right now in bst out_fem_mesh and an equivalent c implementation out_fem_mesh_mex. I've gone up to 25 million vertices, which is probably higher than what we can ever expect.

All the times reported are for a mac book pro (which has a particularly high bandwidth ssd), so I think it would be interesting to see the results in other machines. Specially @tmedani I would like to see how is it it takes between 10 and 30 min in your laptop and a few minutes in this apple. And also, how the numbers compare with the mex implementation.

With these results in mind. Knowing that the implementation is ansi c, completely cross-platform and based on matlab's mex api. The code has about all the possible verification checks I've been able to think of so that it doesn't crash. It will compile for all 3 major operating systems and both for 32 and 64 word lengths. Thus, if this solution was to be adopted, it wouldn't require users to compile anything.

I would like to add that if you still think that it is not a good idea, I would support it no problem. I did the implementation because I wanted to check the difference but that's all.

Thanks.

ftadel commented 3 years ago

I'm a bit confused because I thought that you wanted to use MEX files to avoid writing the msh file on the hard drive, and pass it directly in memory to duneuro. This benchmark does not respond either to the main optimization proposed by @tmedani in this issue, which was to use binary instead of text for the other files. But thanks anyway, this is good to know.

Would you ever use more than 1 million vertices? If we consider that a high resolution mesh has 1e6 vertices, then it is 16s vs 3s. Indeed much faster. But what's the ratio with the actual duneuro forward model computation time? If it runs for >30min after that, do we really care about these extra 16s?

Given this benchmark, I'd say the main thing to consider is the ease of distribution and maintenance (difficulty to set up a correct compilation environment, number of parameters/libraries that the compilation depend on, necessity to use specific versions of Matlab, how easy it would be for someone completely outside of the project to take over this work with only your documentation, etc.)

Your are now the most experienced with all these parameters, so you should probably make the decision. As long as it works easily out of the box for all the main OS, it won't be a nightmare to compile in 5 years and the size of the downloaded package is not much bigger, I'm OK with both solutions.

juangpc commented 3 years ago

I'm a bit confused because I thought that you wanted to use MEX files to avoid writing the msh file on the hard drive, and pass it directly in memory to duneuro. This benchmark does not respond either to the main optimization proposed by @tmedani in this issue, which was to use binary instead of text for the other files. But thanks anyway, this is good to know.

So, maybe I'm the one confused. But I'll explain the way I see it. Right now DUNEuro based tool is based on DUNE which only opens the mesh files as text msh files. So the entry point to the forward model solver is always a text file. Given this, we can either: (i) save mesh file as text with matlab and open with duneuro as it is working right now. (ii) save as binary within matlab and modify duneuro tool so that it opens the binary file saves it back as text and then DUNE reads it. (iii) save it as text with a mex and open with duneuro as it is working right now.

This benchmark studies (or tries to? 😅 ) this third option. Option ii is also ok because you keep matlab code in on the matlab side and c++ code in the duneuro tool (which you're already compiling). But... I just can't get over the idea of reading and writing back a file, just to open it back again.

Would you ever use more than 1 million vertices? If we consider that a high resolution mesh has 1e6 vertices, then it is 16s vs 3s. Indeed much faster. But what's the ratio with the actual duneuro forward model computation time? If it runs for >30min after that, do we really care about these extra 16s?

I think we go up to ~10M vertices. In my laptop that's up to ~5min but I can see it being ~20min in a non-ssd hard drive through the sata bus.

Given this benchmark, I'd say the main thing to consider is the ease of distribution and maintenance (difficulty to set up a correct compilation environment, number of parameters/libraries that the compilation depend on, necessity to use specific versions of Matlab, how easy it would be for someone completely outside of the project to take over this work with only your documentation, etc.)

Completely understand. Just want to help. Well, this is working now. If you want to or feel like it could help. Let me know.

ftadel commented 3 years ago

Given this benchmark, I'd say the main thing to consider is the ease of distribution and maintenance (difficulty to set up a correct compilation environment, number of parameters/libraries that the compilation depend on, necessity to use specific versions of Matlab, how easy it would be for someone completely outside of the project to take over this work with only your documentation, etc.)

Completely understand. Just want to help. Well, this is working now. If you want to or feel like it could help. Let me know.

If you think it can help, go ahead with further integration. You have a full control over this piece of software :-)

tmedani commented 3 years ago

Hi all, I've been able to work on this and finish an initial approach. Here is a benchmark for random data being saved with the original function that is right now in bst out_fem_mesh and an equivalent c implementation out_fem_mesh_mex. I've gone up to 25 million vertices, which is probably higher than what we can ever expect.

All the times reported are for a mac book pro (which has a particularly high bandwidth ssd), so I think it would be interesting to see the results in other machines. Specially @tmedani I would like to see how is it it takes between 10 and 30 min in your laptop and a few minutes in this apple. And also, how the numbers compare with the mex implementation.

With these results in mind. Knowing that the implementation is ansi c, completely cross-platform and based on matlab's mex api. The code has about all the possible verification checks I've been able to think of so that it doesn't crash. It will compile for all 3 major operating systems and both for 32 and 64 word lengths. Thus, if this solution was to be adopted, it wouldn't require users to compile anything.

I would like to add that if you still think that it is not a good idea, I would support it no problem. I did the implementation because I wanted to check the difference but that's all.

Thanks.

@juangpc

here is the results in my computer, starting from 1.000000e+05 vertices


out_fem_msh
iter: 001    | numVertices: 1.000000e+05    | size: 025MB    | time[s]: 3.401768e+00 3 seconds.
iter: 002    | numVertices: 2.500000e+05    | size: 066MB    | time[s]: 8.658063e+00 9 seconds.
iter: 003    | numVertices: 5.000000e+05    | size: 135MB    | time[s]: 1.747681e+01 17 seconds.
iter: 004    | numVertices: 7.500000e+05    | size: 204MB    | time[s]: 2.709224e+01 27 seconds.
iter: 005    | numVertices: 1.000000e+06    | size: 273MB    | time[s]: 3.564153e+01 36 seconds.
iter: 006    | numVertices: 2.500000e+06    | size: 724MB    | time[s]: 9.721959e+01 1 minute, 37 seconds.
iter: 007    | numVertices: 5.000000e+06    | size: 001GB    | time[s]: 1.845747e+02 3 minutes, 5 seconds.
iter: 008    | numVertices: 7.500000e+06    | size: 002GB    | time[s]: 2.768892e+02 4 minutes, 37 seconds.
iter: 009    | numVertices: 1.000000e+07    | size: 003GB    | time[s]: 3.716635e+02 6 minutes, 12 seconds.
iter: 010    | numVertices: 2.500000e+07    | size: 008GB    | time[s]: 9.981218e+02 16 minutes, 38 seconds.
-----------------------------------------------------------------------

out_fem_msh_mex
iter: 001    | numVertices: 1.000000e+05    | size: 025MB    | time[s]: 1.031195e+00 1 second.
iter: 002    | numVertices: 2.500000e+05    | size: 066MB    | time[s]: 2.663538e+00 3 seconds.
iter: 003    | numVertices: 5.000000e+05    | size: 136MB    | time[s]: 4.664536e+00 5 seconds.
iter: 004    | numVertices: 7.500000e+05    | size: 206MB    | time[s]: 6.492341e+00 6 seconds.
iter: 005    | numVertices: 1.000000e+06    | size: 276MB    | time[s]: 8.593094e+00 9 seconds.
iter: 006    | numVertices: 2.500000e+06    | size: 731MB    | time[s]: 2.446434e+01 24 seconds.
iter: 007    | numVertices: 5.000000e+06    | size: 001GB    | time[s]: 5.102805e+01 51 seconds.
iter: 008    | numVertices: 7.500000e+06    | size: 002GB    | time[s]: 6.706848e+01 1 minute, 7 seconds.
iter: 009    | numVertices: 1.000000e+07    | size: 003GB    | time[s]: 1.095695e+02 1 minute, 50 seconds.
iter: 010    | numVertices: 2.500000e+07    | size: 008GB    | time[s]: 2.929621e+02 4 minutes, 53 seconds.

tmedani commented 3 years ago

Hi all,

Always I'm learning new things here :)

It is nice to know that some MEX API can be much faster than the current Matlab code and since the mex can work on all platforms maybe it worth it.

few comments regarding the discussions,

The w/r of these files takes ~10 to 30 min in the case of the millions of nodes and elements, which is around ~ 5% to ~20% of the computation time.

From Juan tests, it seems that I overestimated the computation time, these numbers come from my previous discussions with the Duneuro team, and also from the basic testing scenarios that I did. However, we need also to check all the o/i files (mesh, conductivity tensors, dipoles, and sensors), and of course, it depends on the computer performances.

as discussed with Juan, in order to use these med, we can also include the c/cpp code and then check internally if the user's computer has the c/cpp compiler in order to generate locally these mex files, otherwise, the standard Matlab code can be used.

Regarding the CPP code, it seems that the i/o codes are part of DUNEURO and note DUNE,

https://gitlab.dune-project.org/duneuro/duneuro/-/tree/master/duneuro/io

Therefore we can define our own mesh format (maybe also for tensors, dipoles, and sensors), write the data in a binary from Matlab, and then read it from duneuro which then passes it to the FEM code, without any text file.

I'm not sure how much time we can save regarding the efforts that are needed for all this.