dPys / PyNets

A Reproducible Workflow for Structural and Functional Connectome Ensemble Learning
https://pynets.readthedocs.io/en/latest/
121 stars 41 forks source link

Guidance re: timing & atlas complexity? #12

Closed jlhanson5 closed 7 years ago

jlhanson5 commented 7 years ago

Hello,

I was just starting to test-drive the package and it looks promising. I wondered if you had some information about typical processing time (especially as it scales with atlas complexity, and resting state scan length)? I had a few sizable datasets (n's vary 500-1000) I was trying to apply PyNets to, and wondered a (rough) range of how long processing takes to complete (per sub)?

Right now, I had a pilot subject where the package completed calculations for global and local efficiency using the power atlas. It's been running for ~2 hours, but I can't tell if it's hung up. And I wondered in your experiences, how processing time correlate w/ atlas ROIs? Power has 264, but what if I scaled down to the AAL or other atlas sets. Any information is greatly appreciated, this all looks quite promising (thanks for the development and support).

Cheers, Jamie.

dPys commented 7 years ago

Hi Jamie,

Hard to say, since we haven't gotten around to testing runtimes with different variations of run parameters and input file sizes, yet ;-)

I can say that in general, higher atlas complexity (i.e. I'm guessing you simply mean more ROI's?) will lead to longer runtimes, as would larger node radii (are you using the default node size?) and longer resting-state scan length/larger input files. By that logic, using an AAL or other atlas set should theoretically reduce runtime. Similarly, specifying a precision matrix (i.e. the sparse inverse covariance model) is going to take longer than a covariance or correlation matrix estimation...

All that being said, the pipeline should take anywhere between 1-20 minutes to run per subject for rsfMRI.

Now, when you say that the pipeline has been running for ~2 hours, do you get any error messages? and did it ever finish? if it hangs, where does it hang when running it on your data?

Happy to help.

derek;

jlhanson5 commented 7 years ago

Dear Derek,

Thanks for following up! I think all those responses make sense. In my initial "test driving", processing took a bit longer (~2.5-3 hours), but I was running it on our server (where other processes were also running). During this period, there weren't any error messages. The script would just work through different metrics, but there would be big gaps between when they updated. Global efficiency would be printed in the terminal, and then 20 minutes later local efficiency would then be printed on screen.

I was test-driving things with: -Resting state scans w/ 178 frames -Resampled to MNI 2x2x2 space -Power atlas (I didn't specify any thres or radius, so I presume -thr '0.95' -ns '3'? )

We might have some memory limits on our processes (but I don't think that's the case). Any other thoughts on why we are seeing differences in completion times? Thanks again!

Cheers, Jamie.

dPys commented 7 years ago

Interesting. Would it be possible to repeat the run on a local machine that you know for certain has enough allocated memory to help rule out whether this is a software vs. hardware issue?

The file that you described should not lead to excessively long runtimes, so my first guess would be that this is a hardware issue, but only testing will say for sure. PyNets was built using example files that were 182 frames or larger, so the file size should not be at issue in your case...

jlhanson5 commented 7 years ago

Hello Derek,

I'm just running it locally and running into similar time frames. Right now, I'm 40 minutes in and PyNets has output global_efficiency and local_efficiency. Thoughts?

Thanks much! Jamie.

***edit: I'm about 1.5 hour, and headed home for the night; thus far, PyNets has calculated global_efficiency, local_efficiency, smallworldness, degree_assortativity_coefficient, average_clustering, average_shortest_path_length, and degree_pearson_correlation_coefficient

Here's more info about the file referenced above:

lrdc-175:REST_opt jamielh$ 3dinfo 20140313_18263_REST_d_mcf_s_n_ICA_MNI_GSR.nii.gz ++ 3dinfo: AFNI version=AFNI_16.2.16 (Sep 4 2016) [64-bit]

Dataset File: 20140313_18263_REST_d_mcf_s_n_ICA_MNI_GSR.nii.gz Identifier Code: XYZ_546QDRcB1xlWyoVX3n0MgQ Creation Date: Thu Jul 27 17:10:12 2017 Template Space: MNI Dataset Type: Echo Planar (-epan) Byte Order: LSB_FIRST {assumed} [this CPU native = LSB_FIRST] Storage Mode: NIFTI Storage Space: 642,671,848 (643 million [mega]) bytes Geometry String: "MATRIX(2,0,0,-90,0,-2,0,126,0,0,2,-72):91,109,91" Data Axes Tilt: Plumb Data Axes Orientation: first (x) = Right-to-Left second (y) = Posterior-to-Anterior third (z) = Inferior-to-Superior [-orient RPI] R-to-L extent: -90.000 [R] -to- 90.000 [L] -step- 2.000 mm [ 91 voxels] A-to-P extent: -90.000 [A] -to- 126.000 [P] -step- 2.000 mm [109 voxels] I-to-S extent: -72.000 [I] -to- 108.000 [S] -step- 2.000 mm [ 91 voxels] Number of time steps = 178 Time step = 2.00000s Origin = 0.00000s -- At sub-brick #0 '?' datum type is float: -208.049 to 229.697 -- At sub-brick #1 '?' datum type is float: -198.989 to 240.005 -- At sub-brick #2 '?' datum type is float: -153.887 to 207.944 For info on all 178 sub-bricks, use '3dinfo -verb'

dPys commented 7 years ago

Hi Jamie,

So the hang up after local efficiency in particular is due to the calculation of small-worldness (which takes a bit longer).

Beyond that, it is unclear why it is taking so long. If it occurs both locally and on your server, then that indicates it's most likely something about the image that you're feeding to the workflow. From the header info that you sent, there doesn't seem to be anything out of the ordinary (e.g. 2-sec TR). The file is fairly large (0.5 GB), but that shouldn't lead to the runtimes that you're experiencing...

Two things to try:

  1. Re-run the workflow using the 'sps' model instead of 'corr' and see how that impacts runtime
  2. Try the workflow on a different image from your dataset and/or from a different dataset to compare

If neither of those two steps lead to any insights, would you mind sending me the image (or a similar minimal example) to take a look myself?

derek;

jlhanson5 commented 7 years ago

Hello Derek,

Thanks for the assist. I tried it with sps and on a different subject, and processing time is similar to 'corr'. I uploaded an example sub to our university 'box' site, link here: https://pitt.box.com/s/hpsfsyvtxwyhzjaydd54bar8ncok9qia (I'll email you the password).

Any thoughts? I'm still finalizing my preprocessing pipeline (and can comment on the operations completed on that file), but I don't think that would impact PyNet much (?).

Thanks much, Jamie.

dPys commented 7 years ago

Hi Jamie,

I noticed that the image that you sent does not look like a typical preprocessed fMRI image. How are you preprocessing it?

-Derek

jlhanson5 commented 7 years ago

They are despiked in AFNI, mcflirted in FSL, smoothed in FSL, ICA-AROMA-ed, normalized, and then bandpassed in AFNI with WM CSF and GSR regressed out.

That looked like current practices based on that NeuroImage paper comparing different processing strategies. Thoughts on all that?

On Aug 1, 2017 4:07 PM, "Derek Pisner" notifications@github.com wrote:

Hi Jamie,

I noticed that the image that you sent does not look like a typical preprocessed fMRI image. How are you preprocessing it?

-Derek

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dpisner453/PyNets/issues/12#issuecomment-319481268, or mute the thread https://github.com/notifications/unsubscribe-auth/AG56RbEo-jOJOBbigjW0txcmyFnNgLH4ks5sT4VkgaJpZM4OdBv3 .

jlhanson5 commented 7 years ago

Specifically this NI paper: https://www.ncbi.nlm.nih.gov/pubmed/28302591

On Aug 1, 2017 4:10 PM, "Jamie Hanson" jamie.hanson@pitt.edu wrote:

They are despiked in AFNI, mcflirted in FSL, smoothed in FSL, ICA-AROMA-ed, normalized, and then bandpassed in AFNI with WM CSF and GSR regressed out.

That looked like current practices based on that NeuroImage paper comparing different processing strategies. Thoughts on all that?

On Aug 1, 2017 4:07 PM, "Derek Pisner" notifications@github.com wrote:

Hi Jamie,

I noticed that the image that you sent does not look like a typical preprocessed fMRI image. How are you preprocessing it?

-Derek

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dpisner453/PyNets/issues/12#issuecomment-319481268, or mute the thread https://github.com/notifications/unsubscribe-auth/AG56RbEo-jOJOBbigjW0txcmyFnNgLH4ks5sT4VkgaJpZM4OdBv3 .

dPys commented 7 years ago

HI Jamie,

Based on your description of your preprocessing stream, the image should look different than it does. The file you sent me appears to be a probability map/unthresholded statistical image (perhaps from later in your pipeline?), not a normalized functional image. See a screenshot of an example image that I know works (called filtered_func_data_clean_standard.nii.gz) alongside yours (bottom).

myimage yourimage

derek;

jlhanson5 commented 7 years ago

Hmmm, there must be an issue when I was regressing out the GSR and bandpass-ing the data at the same time (using 3dBandpass or 3dTproject in AFNI). I tried to re-run things, but now ran into an error (perhaps related to the script looking for nuisance confounds?)... I attached the pklz file (zipped) and pasted the error output below. Any thoughts on that? Thanks much!

crash-20170811-170221-jamielh-imp_est-446fa6c0-f111-4f1d-838a-f33ca166c9b6.pklz.zip

jamielh@pfc:~/Volumes/Hanson/Duke_PAC/proc/140318_18278/fmri/temp$ python ~/Volumes/Hanson/Training_Resources/PyNets/pynets.py -i '/home/jamielh/Volumes/Hanson/Duke_PAC/proc/140318_18278/fmri/temp/20140318_18278_REST_d_mcf_s_n_ICA_MNI.nii.gz' -ID '002' -a 'coords_dosenbach_2010' -model 'corr'


INPUT FILE: /home/jamielh/Volumes/Hanson/Duke_PAC/proc/140318_18278/fmri/temp/20140318_18278_REST_d_mcf_s_n_ICA_MNI.nii.gz

SUBJECT ID: 002

ATLAS: coords_dosenbach_2010

USING WHOLE-BRAIN CONNECTOME...

170811-17:01:25,704 workflow INFO: ['check', 'execution', 'logging'] 170811-17:01:25,797 workflow INFO: Running serially. 170811-17:01:25,799 workflow INFO: Executing node imp_est in dir: /tmp/tmpkxcZ7l/PyNets_WORKFLOW/imp_est

Dosenbach 2010 atlas comes with ['rois', 'labels', 'description', 'networks']

Stacked atlas coordinates in array of shape (160, 3).


[Memory] Calling nilearn.input_data.base_masker.filter_and_extract... filter_and_extract('/home/jamielh/Volumes/Hanson/Duke_PAC/proc/140318_18278/fmri/temp/20140318_18278_REST_d_mcf_s_n_ICA_MNI.nii.gz', <nilearn.input_data.nifti_spheres_masker._ExtractionFunctor object at 0x7ffa3cef2d90>, { 'allow_overlap': False, 'detrend': False, 'high_pass': None, 'low_pass': None, 'mask_img': None, 'radius': 3.0, 'seeds': array([[ 18, ..., -33], ..., [-55, ..., 23]]), 'smoothing_fwhm': None, 'standardize': True, 't_r': None}, confounds=None, memory_level=5, verbose=2, memory=Memory(cachedir='nilearn_cache/joblib')) [NiftiSpheresMasker.transform_single_imgs] Loading data from /home/jamielh/Volumes/Hanson/Duke_PAC/proc/140318_18278/fmri/temp/20140318_18278_REST_d_mcf_s_n_ICA_MNI.nii.gz [NiftiSpheresMasker.transform_single_imgs] Extracting region signals


[Memory] Calling nilearn.input_data.nifti_spheres_masker.nifti_spheres_masker_extractor... nifti_spheres_masker_extractor(<nibabel.nifti1.Nifti1Image object at 0x7ffa3cb909d0>) __nifti_spheres_masker_extractor - 55.5s, 0.9min [NiftiSpheresMasker.transform_single_imgs] Cleaning extracted signals


[Memory] Calling nilearn.signal.clean... clean(array([[ 41439.878906, ..., 34513.789062], ..., [ 41440.527344, ..., 34279.625 ]]), standardize=True, sessions=None, detrend=False, confounds=None, low_pass=None, t_r=None, high_pass=None) ____clean - 0.0s, 0.0min __filter_and_extract - 55.6s, 0.9min

Time series has 178 samples

170811-17:02:21,460 workflow ERROR: ['Node imp_est failed to run on host pfc.'] 170811-17:02:21,461 workflow INFO: Saving crash info to /home/jamielh/Volumes/Hanson/Duke_PAC/proc/140318_18278/fmri/temp/crash-20170811-170221-jamielh-imp_est-446fa6c0-f111-4f1d-838a-f33ca166c9b6.pklz 170811-17:02:21,461 workflow INFO: Traceback (most recent call last): File "/home/jamielh/.local/lib/python2.7/site-packages/nipype/pipeline/plugins/linear.py", line 39, in run node.run(updatehash=updatehash) File "/home/jamielh/.local/lib/python2.7/site-packages/nipype/pipeline/engine/nodes.py", line 394, in run self._run_interface() File "/home/jamielh/.local/lib/python2.7/site-packages/nipype/pipeline/engine/nodes.py", line 504, in _run_interface self._result = self._run_command(execute) File "/home/jamielh/.local/lib/python2.7/site-packages/nipype/pipeline/engine/nodes.py", line 630, in _run_command result = self._interface.run() File "/home/jamielh/.local/lib/python2.7/site-packages/nipype/interfaces/base.py", line 1043, in run runtime = self._run_wrapper(runtime) File "/home/jamielh/.local/lib/python2.7/site-packages/nipype/interfaces/base.py", line 1000, in _run_wrapper runtime = self._run_interface(runtime) File "/home/jamielh/.local/lib/python2.7/site-packages/nipype/interfaces/utility.py", line 499, in _run_interface out = function_handle(args) File "", line 334, in mat_funcs File "", line 110, in get_conn_matrix File "/home/jamielh/.local/lib/python2.7/site-packages/sklearn/base.py", line 494, in fit_transform return self.fit(X, fit_params).transform(X) TypeError: fit() got an unexpected keyword argument 'confounds' Interface Function failed to run.

170811-17:02:21,484 workflow INFO:


170811-17:02:21,484 workflow ERROR: could not run node: PyNets_WORKFLOW.imp_est 170811-17:02:21,484 workflow INFO: crashfile: /home/jamielh/Volumes/Hanson/Duke_PAC/proc/140318_18278/fmri/temp/crash-20170811-170221-jamielh-imp_est-446fa6c0-f111-4f1d-838a-f33ca166c9b6.pklz 170811-17:02:21,484 workflow INFO:


Traceback (most recent call last): File "/home/jamielh/Volumes/Hanson/Training_Resources/PyNets/pynets.py", line 1057, in wf.run() File "/home/jamielh/.local/lib/python2.7/site-packages/nipype/pipeline/engine/workflows.py", line 597, in run runner.run(execgraph, updatehash=updatehash, config=self.config) File "/home/jamielh/.local/lib/python2.7/site-packages/nipype/pipeline/plugins/linear.py", line 57, in run report_nodes_not_run(notrun) File "/home/jamielh/.local/lib/python2.7/site-packages/nipype/pipeline/plugins/base.py", line 95, in report_nodes_not_run raise RuntimeError(('Workflow did not execute cleanly. ' RuntimeError: Workflow did not execute cleanly. Check log for details

dPys commented 7 years ago

Missing a tab in the get_conn_matrix function! Thanks for catching it. Pull the latest edits and try again.

Also, glad to hear you may have caught the preprocessing issue. I had a feeling something was not right...

jlhanson5 commented 7 years ago

Ah, glad we caught that. I'm new to python so that error would have taken me a bit to troubleshoot.

RE: preprocessing, what preprocessing steps is PyNets expecting? I have a time-series without bandpass filtering (below)... and this looks like what you referenced previously.

band_fsl_afni

BUT similar to the file I sent previously, bandpassing (with different settings, and using FSL and AFNI's different routines) makes the data look like:

band_fsl_afni

I'm trying to figure out what might be wrong? A colleague at a different institution shared their scripts and I don't think there's any syntax calls that are off, etc. And I don't think it is ICA-AROMA, but wasn't sure if I'm putting overly processed data into PyNets? I tried with and without bandpass-ing but things still took ~2 hours.

Any thoughts are greatly appreciated! And apologies for the trouble.

dPys commented 7 years ago

Hmm, band-pass filtering should not have an impact, and if that's the only difference in your data between images 1 and 2, then that would indeed probably not be the issue.

I'm interested though in the area around the brain here. What is the intensity outside of the brain? Is it masked/skull-stripped? Double-check that the intensity of outside-of-brain voxels is 0, and if not, maybe try masking it with fslmaths and run pynets again. I ask this because the pipeline does finish on your image eventually, correct? To me, that's an indicator that its extracting time-series for extra-brain voxels where it's not supposed to be.

jlhanson5 commented 7 years ago

(Pardon the delay in responding). The data was masked in different (previous) preprocessing steps. I don't see any non-zero values outside the brain... so I'm still not sure why the lag into working through the pipeline. Other thoughts?

rsfmri
dPys commented 7 years ago

Aha, it was smallworldness that was causing the lag with certain kinds of images. Looking into this more to figure out why. In the meantime, please reinstall PyNets. It has been completely repackaged with lots of new features and support for python3!

if you have python3 installed, you can run: cd pynets pip3 install -e .

OR for python2.7:

cd pynets pip install -e .

Thanks so much for reporting!

dPys commented 7 years ago

Any luck?

jlhanson5 commented 7 years ago

Still working on it. I've been trying to troubleshoot our pre-processing (and make sure it isn't an issue in there). Feel free to close the thread and I can always open another one.

On Sun, Sep 10, 2017 at 4:35 PM, Derek Pisner notifications@github.com wrote:

Any luck?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dpisner453/PyNets/issues/12#issuecomment-328370079, or mute the thread https://github.com/notifications/unsubscribe-auth/AG56RSnZjBmh2nZ-7lpLaeCCnpzp-EVgks5shEgCgaJpZM4OdBv3 .

dPys commented 7 years ago

Sure thing! and don't be shy about it. Also, we've made some major, major improvements to PyNets over the past two weeks, so check out our -h help options on the command line to know more. Documentation coming soon, so stay tuned :)