Make dag construction faster

poquirion commented 7 years ago

psom_pipeline_init takes for ever!

poquirion commented 7 years ago

I am doing that right now!

poquirion commented 7 years ago

However, I still haven't checked out were psom could be optimized...

pbellec commented 7 years ago

Not sure what you are referring to. I posted on this thread by mistake earlier. I moved my comment to #289

pbellec commented 7 years ago

re psom_pipeline_init you may want to start profiling a script that takes some time to run (maybe not a huge pipeline, just large enough to be likely to hit the same bottlenecks as a huge one). https://www.gnu.org/software/octave/doc/v4.0.1/Profiling.html

poquirion commented 7 years ago

On 1570 subject:

oups!

error: out of memory or dimension too large for Octave's index type
error: called from
    psom_is_dag at line 67 column 15
    psom_pipeline_init at line 369 column 27
    psom_run_pipeline at line 460 column 17
    niak_pipeline_fmri_preprocess at line 751 column 5
    go_gsp at line 91 column 10

poquirion commented 7 years ago

Each subject has a T1 and a Bold and there is 1570 subject Old psom pipeline preparation: 861 s New psom pipeline preparation

poquirion commented 7 years ago

For the error, we would need to recompile octave with the --enable-64 option! Or rewrite the code... It crashs at the following line:

mask_term = max(adj,[],1) == 0; % find terminal nodes

I guess max is creating some huge array

pbellec commented 7 years ago

The question is: why did it ever work?

pbellec commented 7 years ago

Latest commit on issue111 branch should fix the issue in psom_is_dag. Found a way to do the same thing without introducing linear indices on the large sparse matrix, using find.

Also accelerated psom_files2cell. I was able to build a dependency graph for the fMRI preprocessing of the full HCP sample (that is 100k jobs) in 80 sec. Most of the time (68 s) is spent reorganizing the input/output file names. Still a bit long, but substantially faster than before. The construction of the graph itself is 11 sec. I was able to run psom_is_dag on the resulting graph (100k x 100k matrix with 300k depency relations). So everything seems in order.

pbellec commented 7 years ago

For reference, with the old implementation on the HCP preprocess pipeline, the reorganization of file names took 2 mns (x2 speed up) and the construction of the graph tool 652 sec (x65 speed up). Altogether we move from 12 mns (unacceptable) to 1.5 mn (awkward). Not sure we can get much better than that, but this pipeline features over 6000 fMRI datasets. It represents an extreme use case, at least for now. I am curious about the GSP benchmarks ( @poquirion you did not post the perfs).

poquirion commented 7 years ago

I did not post them because the new branch crashed on them! I am running the test right now

pbellec commented 7 years ago

@poquirion I have investigated the problem and found a bug. The new psom_files2cell did not deal well with edge cases (extra filesep in file names, empty file names, "special" file names 'gb_niak_omitted' or 'gb_psom_omitted'). The results should now be identical between the old and new versions.

poquirion commented 7 years ago

I now have the number for the new version Each subject has a T1 and a Bold and there is 1570 subject Old psom pipeline preparation: 861s New psom pipeline preparation 333s

Wich is much better, now most of the time is spent after the following log (~90% of the time)

Setting up the to-do list ...
   I found 57499 job(s) to do.

pbellec commented 7 years ago

which means I need to optimize the rest... will have a look.

poquirion commented 7 years ago

actually a significant amount of time is spent after the Setting up the to-do list ... part. I will have better profile soon.

poquirion commented 7 years ago

new problem! It does built the dag faster, but it also crashes the daemon:

/> cat deamon.log 
error: load: reading matrix data for 'graph_deps'
error: load: trouble reading binary file '/home/poquirion/test/result/logs/PIPE.mat'

pbellec commented 7 years ago

psom_run_script has a broken test.

pbellec commented 7 years ago

OK so I finally have a lead on this. Looks like Octave is breaking when it saves a sparse logical array. I have to confirm, but the dependency graph is definitely incorrect in the logs, while it's correct when generated in line.

pbellec commented 7 years ago

@poquirion I have just pushed a few changes which should resolve the problem. There is a bug in Octave when saving sparse boolean arrays, at least up to 4.0.2. Would need to test with the latest version (4.2.0). Let me know if it works for you. Note that I also incorporated function to build the pipeline from the tutorial (psom_test_pipe_tutorial), and tests in psom_run_pipeline as well as psom_build_dependencies. I still need to speed up psom_pipeline_init, but getting closer to close this cursed issue.

pbellec commented 7 years ago

dag construction is now faster, closing the issue. We'll have a separate issue for psom_pipeline_init.

SIMEXP / psom

Make dag construction faster #111