catalystneuro / spike-sorting-hackathon

https://catalystneuro.github.io/spike-sorting-hackathon/
MIT License
6 stars 10 forks source link

visualizing-raw-data: discussion #15

Open magland opened 2 years ago

magland commented 2 years ago

@jsiegle A couple questions about what you have in mind for the raw data visualization.

Is there a good example dataset (not too long duration) that we could use during development?

Do we want to be able to scale the amplitude during viewing? If so, we won't be able to prepare the pre-processed pyramid data as images.

jsiegle commented 2 years ago

Yes, I will prepare a few example datasets for this, for both Neuropixels 1.0 and 2.0 data.

I don't think amplitude scaling is necessary, but we should discuss. I find a color map that ranges between about ±50 µV works well for the AP band/high-passed data.

magland commented 2 years ago

Yes, I will prepare a few example datasets for this, for both Neuropixels 1.0 and 2.0 data.

I don't think amplitude scaling is necessary, but we should discuss. I find a color map that ranges between about ±50 µV works well for the AP band/high-passed data.

Great! I did some playing around with deck.gl and react (react is what we use for figurl/sortingview). See: https://deck.gl/docs/get-started/using-with-react

I was able to get a basic view working, with tile images pulled down from URLs. However, it's not clear to me how to adapt that example to use an OrthographView (https://deck.gl/docs/api-reference/core/orthographic-view) instead of a MapView (https://deck.gl/docs/api-reference/core/map-view) which is the default. Obviously we are not going to want to use longitude/latitude.

I guess I need to do some more reading. But as far as I can see, we'll just need to pre-compute a bunch of tiles (.png files) at the various scales. If viewing locally, you'd store those on disk, I suppose. Otherwise, we can use kachery-cloud to easily store the tiles in the cloud, and get a shareable link to the deck.gl figure with figurl.

Tagging @jsoules

jsiegle commented 2 years ago

For JavaScript/React, I was able to view image tiles by modifying the code for this example, which uses a TileLayer with a Cartesian coordinate system and an OrthographicView.

The TileLayer class is not available in pydeck, but I was able to get it to work using a custom layer someone developed. In that case it does render everything based on latitude/longitude coordinates, so we'd have try to modify the extension to display in Cartesian space.

By the way, this is how I'm converting raw data to image tiles (using the pyvips library):

RdGy = plt.get_cmap('RdGy')

scaled_data = (data + 50)/100 # rescale -50 to +50 uV to 0-1
scaled_data[scaled_data < 0] = 0 # remove outliers
scaled_data[scaled_data > 1] = 1 # remove outliers

a = np.flip((RdGy(scaled_data.T)[:,:,:3]*255).astype(np.uint8), axis=0) # colorize and convert to uint8

image = pyvips.Image.new_from_array(a)

image.dzsave(fname, 
     basename='neuropixels', 
     overlap=0, 
     tile_size=512, 
     layout=pyvips.enums.ForeignDzLayout.DZ)
magland commented 2 years ago

@jsiegle thanks, that's very helpful.

Do you expect all the data to fit in memory so we can use dzsave in that way? Or are we going to need to write a custom saver that can handle much larger datasets?

Also, if the duration is really long, the image would be highly non-square. Do you envision viewing the data in multiple rows?

Related to the above, are you hoping to view entire datasets in once glance?

jsiegle commented 2 years ago

I don't think it's necessary to be able to inspect the full dataset -- nobody has time to browse through that much data. A few minutes should be plenty.

Raw Neuropixels data is about 1.4 GB/minute, so a several-minute chunk should easily fit into memory. In my tests so far, this shrinks to about 600 MB/minute after saving the image pyramid in JPEG format.

magland commented 2 years ago

I don't think it's necessary to be able to inspect the full dataset -- nobody has time to browse through that much data. A few minutes should be plenty.

Raw Neuropixels data is about 1.4 GB/minute, so a several-minute chunk should easily fit into memory. In my tests so far, this shrinks to about 600 MB/minute after saving the image pyramid in JPEG format.

Okay cool.

Here's my initial crack at creating a tiled image figURL view from an arbitrary image: https://github.com/scratchrealm/figurl-tiled-image

For demo purpose I am creating a Mandelbrot set image, which ends up being 10-20 MiB image pyramid I believe. https://www.figurl.org/f?v=gs://figurl/tiled-image-1&d=ipfs://bafkreihcn72fhpebdujz5dj7bkmsrn3cydrl73y6gnwawtk5by4jmnsv4e&label=Mandelbrot%20tiled%20image

The following file contains the code you provided for creating the image pyramid: https://github.com/scratchrealm/figurl-tiled-image/blob/main/figurl_tiled_image/TiledImage.py

And here's the react/typescript project: https://github.com/scratchrealm/figurl-tiled-image/tree/main/gui

To integrate this with SpikeInterface, I guess we just need a code snippet that takes a recording extractor and outputs an image array [N1 x N2 x 3] uint8. Then that could be fed into TiledImage to create the shareable url.

jsiegle commented 2 years ago

Wow, this is awesome!

When I tried to get the URL for a new TiledImage it gave the following error:

Exception: Error in initiateIpfsUpload (500) Internal Server Error: Error: db is not defined

Do I have to change something to the config file related to this?

magland commented 2 years ago

Oops I accidentally broke something on the server... now it should work. Please retry the same operation.

jsiegle commented 2 years ago

It worked!

https://www.figurl.org/f?v=gs://figurl/tiled-image-1&d=ipfs://bafkreid3gmolclm5pjyd27hlbhnxlxefoh3yxi4cylwsph2po25wcqfm4e&label=Neuropixels%20Example

magland commented 2 years ago

Great!

Might be nice to have a multiple-row option to make use of vertical space.

But that makes me think that we'd want to pass the image into TiledImage rather than the numpy array. To have more flexibility.

oliche commented 2 years ago

I think I will have a few Neuropixel 1.0 datasets as well !

FYI there is this project that we developped for QC at IBL https://github.com/oliche/viewephys We also stream data from HTTP server and S3 using this tool. Not suggesting we build on this, but there are some features we thought interesting:

Happy to discuss this further !

magland commented 2 years ago

@oliche that looks cool. Do you think we start a new hackathon project for getting viewephys to work with spikeinterface extractors?

I think it's valuable to have both PyQt5-based programs that run locally and browser-based views that are shareable.

magland commented 2 years ago

@jsiegle, made some updates to figurl-tiled-image. You can now pass in either a numpy array or a pyvips image into TiledImage. There's a new high-res earth example on the README. The upload step is now multi-threaded, and therefore much faster.

https://github.com/scratchrealm/figurl-tiled-image

alejoe91 commented 2 years ago

@magland @jsiegle @oliche @samuelgarcia I think that a project could be to make a new backend + novel widgets for spikeinterface in general and improve lazy visualizations! Also for the TiledImage widget the input data would come from a SI object no? Let's discuss later today :)

oliche commented 2 years ago

@magland yes we can use spike interface extractors.

To me it sounds almost trivial as it only uses numpy arrays for raw data and spike trains so it's generic. I think it's worth doing as it's low time investment.