espenhgn / ViSAPy

Python package for generating benchmark data for evaluating spike sorting methods
GNU General Public License v2.0
21 stars 8 forks source link

Not Necessarily an Issue but Questions for Runtime and Size of SpikeTimesEx.db #6

Closed DominicTanzillo closed 4 years ago

DominicTanzillo commented 4 years ago

I've updated it all to work in Python3 and while running example_in_vivo_tetrode.py, the creating database step has taken over 13 hours and has generate a file of over 400 Gb. Should the SpikeTimesEx.db be so large?

I want to make sure that I'm not missing out on a function that ends the simulation that has been updated as NEST and Neuron and associated directories have updated with the year.

What is the runtime, I didn't see one in the paper.

espenhgn commented 4 years ago

Hi @DominicTanzillo, Thanks for your interest in this tool. Those numbers you're seeing definitely sounds excessive. I haven't done any work on ViSAPy lately, not since Feb. 2019 on branch dev I see. The database of spike times from the separate NEST simulation should be no more than a few MB unless your simulation duration is very long.

As the dev branch already contains fixes for Python3 (I believe I used 3.6 then) I would compare with that using NEST from around that time (>= 2.16) and LFPy >=v2.0.1 (the latest pip release of LFPy should still be fine). I realize this is can be a bit of a hurdle to set up though.

I just tried running the tetrode example with 'tstop': 1000 using python3.7 and noticed that the .db file just grew incrementally in size to >500 MB before I killed the processes. I vaguely remember having to change something with reading .gdf into sqlite dbs for python3.7 in the similar code in hybridLFPy (https://github.com/INM-6/hybridLFPy/pull/9/commits), but I'm not 100% sure what's up. Downgrading to python3.6 seems to have fixed the problem running locally in a conda test environment: image

Conda environment details: conda_env_macos.txt

Unless you're really committed to getting ViSAPy up and running again (contributions are obviously welcome!), I can point you to @alejoe91's MEArec module at https://github.com/alejoe91/MEArec, which may also produce quite realistic ground-truth datasets, but much faster. Running long-duration multicompartment simulations as ViSAPy is doing is generally quite slow, but quite doable on HPC resources w. MPI and a lot of available memory.

DominicTanzillo commented 4 years ago

Wow thank you for the immediate response, killing now!

DominicTanzillo commented 4 years ago

@espenhgn Thank you so much for you help. I've started using your dev brand and I'm running into a second issue at the final hurdle which I'll start a new thread about.