epfl-lts2 / pygsp

Graph Signal Processing in Python
https://pygsp.rtfd.io
BSD 3-Clause "New" or "Revised" License
488 stars 93 forks source link

Question on importing from segmented numpy array or point cloud cloud #91

Closed Sh4zKh4n closed 3 years ago

Sh4zKh4n commented 3 years ago

Hi,

I've read the docs but also.looked around and can't seem to work out how to do. So would appreciate any thoughts. I have a large numpy array that is really an array of voxels. The array is dense so there are alot of nodes. I can use another library to reduce the footprint to nonzero values to point clouds x,y,z,feature (segment label/number). I would really like to convert either of these into a network to use in pygsp to understand wave mechanics. Each node would be connected to it's neighbour but I would really like to understand how to encode the edge with the distance between nodes (perhaps collapsing similar nodes and multiply the edge to represent the distance) but have the nodes labelled as the segmentation label. Any thoughts how to proceed? Would appreciate any help.

Sh4zKh4n commented 3 years ago

Is this a really stupid question that is actually really simple? I am finding it hard really how to take a ndim numpy array, then transform this to a graph network where the nodes are labeled with the segment colour, to use as a hash key. The edges would then be connected to neghbours and so forth. Distances of edges could either be weights on the edges or another label on the nodes, to be used for computing the lengths between the neghbourhoods? I can see that one could convert this to a full mesh grid but then you have every node connected to itself but I would rather discount or remove certain labels such as 0 (empty space) and have no edge within a certain range.

nperraud commented 3 years ago

I am not sure I understand. Could you maybe reformulate the question and focus on the specific challenge you are facing? Maybe write some code that would show us what is the challenge?

Sh4zKh4n commented 3 years ago

Sorry, this was badly explained.

Ok, I have a dense numpy array of ndim 3. It is a volumetric dataset of xray tomography, it has been segmented already. If we treat each of the entries, which is a voxel, in the numpy array as a point in a graph. The graph would be dense and every voxel would be connected to each of it's own neighbourse based on contact/neighbourhood. Most of the volume is empty space i.e. assigned a pixel value 0 in a 8 bit (0-255) label. If this represents a volume of something i.e. 0 is nothing but anything labelled >0 is something. This dense array can be converted from a np.array with shape (1000,1000,1000) to a point cloud with four columns (Nx4) of information. x - pos, y pos, z-position and a label (which corresponds to the material based on absorption characteristics).

I see there is a bit of info on converting a point cloud to a graph for a (nx3) array (x-pos, y-pos, z-pos) but I wondered, how i could incorporate the label (the fourth column). So instead of producing a graph of positions with nodes as only positions. This would be a graph with positions and the label? Does that make sense or have I badly explained that? Then using the script in pyGSP a relationship of distance for edges could be calculated. My interest isn't about distance through topology, but distance through volume and connectivity.

My specific challenge is that I have a numpy array that is 10,000 x 5,000 x 10,000 in shape. Most of this information is redundant and empty space. The segmented volume array can be thought of as a point cloud and omitting the 0 label as empty space would get a much smaller footprint. Personally then I would like to run paths from a specified node to another and cycles. That would give me a path distance but each node would then be logged for the label. So then I could reconstruct a path of say a way without having to track the the wave characteristics at each node. Only to the end node, which is what I am interested in.

nperraud commented 3 years ago

I think I understand the format of your data.

Creating a graph from using position and label is simple as the position of the graph points can be n dimensional vector. So you can simply build an NNGraph using your Nx4 array. Be careful with the normalization of your labels with respect of the position values.

I still do not understand what you want to do.

Sh4zKh4n commented 3 years ago

@nperraud, here's the thing I cant seem to get my head around how you iterate through data to assign nodes with positional data (and/or feature label) with radius based k nearest neighbours.

What I wanted to do was to create a model of wave propagation based on neighbour connectivity. I wanted to try and create a wave propagation model similar to this paper https://ieeexplore.ieee.org/document/7442096 . So xyz positions of labeled data are within the nodes and the edge either is just a link or contains the distance metric determined by the xyz positions. Does that make sense?

Networkx has some new examples of geospatial methods in the dev docs but it is based on geospatial xy data and not on generic xyz positions https://networkx.org/documentation/latest/auto_examples/geospatial/plot_polygons.html#sphx-glr-auto-examples-geospatial-plot-polygons-py . As an example. a wave as a pde so I thought pygsp might be a good place to try.

nperraud commented 3 years ago

The pygsp is based on numpy and sometime scipy. So all iterations are done via these packages.

As you imagine, I cannot take the time to read the paper. The goal of this issues is to answer specific code-related question.

Sh4zKh4n commented 3 years ago

@nperraud completely understand, I just thought since you asked what I was doing. Those would be good examples but the basics are within the comments. Understand you'd be reading too many papers if every poster gave you a paper! Can you then point me in the direction of the source code in pygsp related to creating a network from a numpy array and from a point cloud when no adjacency or edge list exist? Would really appreciate it because I'm finding it difficult on how to encode a graph from a numpy array that is a 3d lattice array with intensity values within the array. Positional is either the meshgrid value or a point cloud.

Would really appreciate it.

nperraud commented 3 years ago

I am still not sure I understand the problem. In the pygsp, a graph basically consists of an adjacency matrix. Additionally, you can add coordinates to the node in the form of a numpy array. The graph class take the adjacency matrix as an argument, so you can define any graph you desire. Now, if you want a graph based on the distance of some coordinate, you can use an NNGraph.

Sh4zKh4n commented 3 years ago

The issue for me was that I didnt really understand how to get a array into a graph. Given that we are in a pandemic, it's a bit hard to go and have a chat with someone to learn how to do this kind of work when your on a desk by yourself. There are lots of answers on the interent where the explanation isnt particularly helpful for someone who isnt sure.

So I think I have found a method. I am putting it up here because if someone else has the same issue they can try to use this as a basis (if it works). I have to test it a bit more. my array is rather large and dense. So I could convert it to a point cloud as such:


import gradslam as gs
import numpy as np

# create basic voxel grid
voxel_grid = np.zeros((10, 10, 10, 1), dtype=np.uint8)
voxel_grid[4:6, 4:6, 4:6] = 100

# convert voxel grid to xyz coordinates and features
nonzero_inds = np.nonzero(voxel_grid)[:-1]
points = torch.tensor(nonzero_inds, dtype=torch.float32).T
label = torch.tensor(voxel_grid[nonzero_inds])

z, x, y =point

pcl_label_tuples = list(zip(z,x,y,label))

point_cloud = pd.DataFrame(pcl_label_tuples, columns=['z','y','x','label'])

Or it think the sparse library does the equivalent by equating the 0 values as not interesting and listing as column in a dim_0, dim_1, dim_2, label for my type of dataset.

import numpy as np
import dask.array as da
import networkx as nx
import dask.dataframe as dd
import pandas as pd

big_array =da.random.random((15000, 2500, 10000), chunks=(1000, 500, 1000), dtype=np.uint8) # big array 
big_array[big_array<0.9] = 0 # fill most of the array with zeros

ba_s = sparse.COO(big_array)

bas_df = dd.io.from_dask_array(ba_s, columns=['z', 'y', 'x', 'label'])

bas_df.to_hdf('df_sparse_BA.hdf', '/data')    

bas_df = dd.read_hdf('df_sparse_BA.hdf') 
object_id =0

def point_graph(df, object_id=object_id):
    # create kdtree
    tree = spatial.cKDTree(df[['z', 'y', 'x']])

    # get neighbours within distance for every point, store in dataframe as edges
    edges = pd.DataFrame({'src':[], 'tgt':[]}, dtype=int)
    for source, target in enumerate(tree.query_ball_tree(tree, r=2)):
        target.remove(source)
        if target:
            edges = edges.append(pd.DataFrame({'src':[source] * len(target), 'tgt':target}), ignore_index=True)

    # create graph for points using edges from Balltree query
    G = nx.from_pandas_dataframe(edges, 'src', 'tgt')

    for i in sorted(G.nodes()):
        G.node[i]['label'] = nodes.label[i]
        G.node[i]['x'] = nodes.x[i]
        G.node[i]['y'] = nodes.y[i]
        G.node[i]['z'] = nodes.z[i]
   return G

NN_graph = point_graph(df)

The documents for the NN graph didnt really help me understand how to do it. I saw one answer on stackoverflow of anything remotely similar, so I have modified it for here. So all I am trying to do is try to build a graph, where the edges have no wieight but linkages to a nearest neighbour. The nodes have multiple weights, so this can be an undirected graph.

If you really want to know what I am doing, I would like to try to run a random walk on the graph where the random walk is a path for a travelling wave where for a point at time = 0 the amplitude is it's maximum, the point cloud and the label represent a material structure.

image

Lets assume f(r.t) is 1d vector between two points (p1 = G.node(0) and p2=G.node(3)) then depending on the speed, the scalar of the vector (p1+2)^2 = p1^2 + p2^2, directional information from whether dir = p2-p1 is positive or negative in any of the x, y, z directions for whether the travelling wave of a label I can determine the wave characteristics at point p2. If the point cloud is populated enough, instead of doing a power law solution to the amplitude, in a densely enough connect nn I should be able to split the amplitude by number of neighbours per node (so basically kirchoffs). Each label say is a diferrent material if the image is segmented, with a density, (assuming no material dispersion from frequency dependency). I can then try to work how paths from a node to all the others through a search like BFS or A*. Then repeat as a mirror the same path to the starting node. It would then allow me to work out the resultant wave from the time it took to travel back and the frequency characteristics. since I have broken the problem from a difficult 3d problem to a graph of 1d equations that are much simpler and having discretised them. I am not interested at the moment to determine the wave properties at all other points. It's computationally too expensive. This though in my system could help to show geometric wave dispersion. Which it may or may not. Its a way of not doing FEA (heavy computation and a little bit of a pain) or FD. I am just interested to know what the summation of the waves are after a number of tours within a given time.

Hopefully this doesn't annoy you and I didn't want to over complicate everything to explain all the details. I just seem to have put my question badly. One of the issues is that I do worry about the number of point and will have to coarsen the array to a certain amount for networkx to work. graph-tool looks like an option but not simple.

mdeff commented 3 years ago

As @nperraud suggested, you can build an NNGraph. Quick example:

import numpy as np
import pygsp as pg

# Number of non-zero voxels.
n = 100

# Some random positions.
xyz = np.random.uniform(size=(n, 3))

# Half with label 1, other half with label 2.
label = np.ones(n)
label[n//2:] = 2

# Build a nearest-neighbor graph with edge as distances.
graph = pg.graphs.NNGraph(xyz)

# Look at it.
graph.plot(label)

20210211_185022