DeepTrackAI / DeepTrack2

DeepTrack2
MIT License
162 stars 50 forks source link

Training of MAGIK #202

Open po60nani opened 9 months ago

po60nani commented 9 months ago

Hi,

Thank you for your valuable network. I am currently trying to train MAGIK on my dataset, which has a structure similar to the ones in your tutorials. I encountered two issues during this process:

  1. Extra Columns in EdgeExtractor Output: The EdgeExtractor function returns a DataFrame with extra columns containing NaN values. I noticed this discrepancy between my data and the provided tutorial data, and I'm not sure why these extra columns are present. As a workaround, I manually remove these extra columns before returning the data in the function.

Input_df:

input_df

Output_df:

output_df

  1. ValueError in SelfDuplicateEdgeAugmentation: When attempting to train with my dataset, I encountered the following error:
    File "mtrand.pyx", line 909, in numpy.random.mtrand.RandomState.choice
    ValueError: a must be greater than 0 unless no samples are taken.

    I traced this error back to the SelfDuplicateEdgeAugmentation function, specifically in the inner function where offset = maxnofedges - nofedges results in an offset of 0. I'm unsure how to handle this situation and would appreciate guidance on resolving this issue.

Any assistance or clarification on these matters would be greatly appreciated.

Best regards,

JesusPinedaC commented 9 months ago

Hi @po60nani,

Thanks for your interest in MAGIK!

  1. This error is due to EdgeExtractor expecting frames to start from zero. You can resolve this issue by nodesdf["frame"] -= nodesdf["frame"].min(). We will clarify this in the documentation.

  2. The problem is indeed related to what you mentioned. However, it is related not to offset = 0 but rather to nofedges = 0.

In cases where offset is 0 and nofedges is greater than or equal to 0, np.choice returns an empty array, which does not affect the function's performance. In such cases, duplicated_edges is simply assigned the value of edges.

The case you describe, in turn, arises when nofedges = 0 and offset > 0 (idx = np.random.choice(0, whatever > 0, replace=True) reproduces the error), indicating that some graphs in your batch do not have any edges.

To better help you resolve this issue, we need to confirm some details:

I suggest solving problem 1 and then check if 2 still persists!

po60nani commented 9 months ago

Thank you for providing additional insights into the issue. I appreciate your effort in investigating the problem. As suggested, I will focus on solving problem 1 and then reevaluate if problem 2 persists.

I have implemented the suggested solution by adjusting the frames using nodesdf["frame"] -= nodesdf["frame"].min(). However, the problem persists.

image

JesusPinedaC commented 9 months ago

Could you please try with the following toy example?

import numpy as np
import pandas as pd

import deeptrack as dt

# like in your case
frame_shift = 0 # right case: 0

# randomly generated centroids
centroids = np.random.rand(80, 2)
frames = np.arange(0, 80) + frame_shift

nodesdf = pd.DataFrame()
nodesdf[["centroid-0", "centroid-1"]] = centroids
nodesdf["frame"] = frames
nodesdf["label"] = 0 # single particle
nodesdf["solution"] = 0
nodesdf["set"] = 0

# display the first 20 rows of the dataframe
nodesdf.head(20)

# Seach radius for the graph edges
radius = 0.7

# time window to associate nodes (in frames)
nofframes=3

# compute edges
edges = dt.models.gnns.graphs.EdgeExtractor(
    nodesdf, 
    parenthood=np.ones((1, 2)) * -1, 
    radius=radius, 
    nofframes=nofframes
    )

Here, frame_shift = 7 reproduces the issue:

nodesdf nodesdf_frame_7

edges edges_frame_7

While, frame_shift=0 produces the correct output:

nodesdf nodesdf_frame_0

edges edges_frame_0

po60nani commented 9 months ago

I have thoroughly examined the provided toy example, and it accurately reproduces the expected results you shared. However, when applying the code to my dataset, I encountered an error. To facilitate the troubleshooting process, I have uploaded both the CSV file (df_PSFs.csv) and the code for your review.

Code:

import deeptrack as dt
from deeptrack.models.gnns.generators import GraphGenerator
import pandas as pd
import numpy as np
import deeptrack as dt
import logging

logging.disable(logging.WARNING)

if __name__ == "__main__":

    path_csv = r'./df_PSFs.csv'
    nodesdf = pd.read_csv(path_csv)

    print(nodesdf.head(20))

    # normalize centroids between 0 and 1
    nodesdf.loc[:, nodesdf.columns.str.contains("centroid")] = (
            nodesdf.loc[:, nodesdf.columns.str.contains("centroid")]
            / np.array([1000.0, 1000.0])
    )

    nodesdf.loc[:, 'solution'] = 0.0
    nodesdf.loc[:, 'set'] = 0.0

    nodesdf["frame"] -= nodesdf["frame"].min()

    # display the first 20 rows of the dataframe
    nodesdf.head(20)

    # Search radius for the graph edges
    radius = 0.2

    # Time window to associate nodes (in frames)
    nofframes=3

    # Compute edges
    edges = dt.models.gnns.graphs.EdgeExtractor(
        nodesdf, 
        parenthood=np.ones((1, 2)) * -1, 
        radius=radius, 
        nofframes=nofframes
    )

    a = 1

Additional Information: df_PSFs.csv

image

image

Upon execution, I expect the code to run successfully without encountering any errors. The provided toy example validates this expectation, but the issue arises with my dataset.