Training of MAGIK - Githubissues

po60nani commented 9 months ago

Hi,

Thank you for your valuable network. I am currently trying to train MAGIK on my dataset, which has a structure similar to the ones in your tutorials. I encountered two issues during this process:

Extra Columns in EdgeExtractor Output: The EdgeExtractor function returns a DataFrame with extra columns containing NaN values. I noticed this discrepancy between my data and the provided tutorial data, and I'm not sure why these extra columns are present. As a workaround, I manually remove these extra columns before returning the data in the function.

Input_df:

input_df

Output_df:

output_df

ValueError in SelfDuplicateEdgeAugmentation: When attempting to train with my dataset, I encountered the following error:
```
File "mtrand.pyx", line 909, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken.
```
I traced this error back to the SelfDuplicateEdgeAugmentation function, specifically in the inner function where offset = maxnofedges - nofedges results in an offset of 0. I'm unsure how to handle this situation and would appreciate guidance on resolving this issue.

Any assistance or clarification on these matters would be greatly appreciated.

Best regards,

JesusPinedaC commented 9 months ago

Hi @po60nani,

Thanks for your interest in MAGIK!

This error is due to EdgeExtractor expecting frames to start from zero. You can resolve this issue by nodesdf["frame"] -= nodesdf["frame"].min(). We will clarify this in the documentation.
The problem is indeed related to what you mentioned. However, it is related not to offset = 0 but rather to nofedges = 0.

In cases where offset is 0 and nofedges is greater than or equal to 0, np.choice returns an empty array, which does not affect the function's performance. In such cases, duplicated_edges is simply assigned the value of edges.

The case you describe, in turn, arises when nofedges = 0 and offset > 0 (idx = np.random.choice(0, whatever > 0, replace=True) reproduces the error), indicating that some graphs in your batch do not have any edges.

To better help you resolve this issue, we need to confirm some details:

Can you confirm if the centroids are normalized to a range of roughly 0 to 1?
The radius used to generate edges must be large enough to connect time-subsequent frames.
Please note that the ExtractEdges function was not designed to be used as a standalone function. Instead, it was intended to be used within GraphExtractor, which ensures the proper handling of the dataframe until the final graphs are generated. If you haven't already, I recommend using this function as we do in the tutorials.

I suggest solving problem 1 and then check if 2 still persists!

po60nani commented 9 months ago

Thank you for providing additional insights into the issue. I appreciate your effort in investigating the problem. As suggested, I will focus on solving problem 1 and then reevaluate if problem 2 persists.

I have implemented the suggested solution by adjusting the frames using nodesdf["frame"] -= nodesdf["frame"].min(). However, the problem persists.

JesusPinedaC commented 9 months ago

Could you please try with the following toy example?

import numpy as np
import pandas as pd

import deeptrack as dt

# like in your case
frame_shift = 0 # right case: 0

# randomly generated centroids
centroids = np.random.rand(80, 2)
frames = np.arange(0, 80) + frame_shift

nodesdf = pd.DataFrame()
nodesdf[["centroid-0", "centroid-1"]] = centroids
nodesdf["frame"] = frames
nodesdf["label"] = 0 # single particle
nodesdf["solution"] = 0
nodesdf["set"] = 0

# display the first 20 rows of the dataframe
nodesdf.head(20)

# Seach radius for the graph edges
radius = 0.7

# time window to associate nodes (in frames)
nofframes=3

# compute edges
edges = dt.models.gnns.graphs.EdgeExtractor(
    nodesdf, 
    parenthood=np.ones((1, 2)) * -1, 
    radius=radius, 
    nofframes=nofframes
    )

Here, frame_shift = 7 reproduces the issue:

nodesdf nodesdf_frame_7

edges edges_frame_7

While, frame_shift=0 produces the correct output:

nodesdf nodesdf_frame_0

edges edges_frame_0

po60nani commented 9 months ago

I have thoroughly examined the provided toy example, and it accurately reproduces the expected results you shared. However, when applying the code to my dataset, I encountered an error. To facilitate the troubleshooting process, I have uploaded both the CSV file (df_PSFs.csv) and the code for your review.

Code:

import deeptrack as dt
from deeptrack.models.gnns.generators import GraphGenerator
import pandas as pd
import numpy as np
import deeptrack as dt
import logging

logging.disable(logging.WARNING)

if __name__ == "__main__":

    path_csv = r'./df_PSFs.csv'
    nodesdf = pd.read_csv(path_csv)

    print(nodesdf.head(20))

    # normalize centroids between 0 and 1
    nodesdf.loc[:, nodesdf.columns.str.contains("centroid")] = (
            nodesdf.loc[:, nodesdf.columns.str.contains("centroid")]
            / np.array([1000.0, 1000.0])
    )

    nodesdf.loc[:, 'solution'] = 0.0
    nodesdf.loc[:, 'set'] = 0.0

    nodesdf["frame"] -= nodesdf["frame"].min()

    # display the first 20 rows of the dataframe
    nodesdf.head(20)

    # Search radius for the graph edges
    radius = 0.2

    # Time window to associate nodes (in frames)
    nofframes=3

    # Compute edges
    edges = dt.models.gnns.graphs.EdgeExtractor(
        nodesdf, 
        parenthood=np.ones((1, 2)) * -1, 
        radius=radius, 
        nofframes=nofframes
    )

    a = 1

Additional Information: df_PSFs.csv

The output of nodesdf is:

The output of edges is:

My panda version is: pandas 1.5.3

Upon execution, I expect the code to run successfully without encountering any errors. The provided toy example validates this expectation, but the issue arises with my dataset.

DeepTrackAI / DeepTrack2

Training of MAGIK #202