Pandas DataFrame Reader, additional example.

apiszcz commented 3 months ago

Is it possible to document an example for the Pandas Dataframe reader that reads files from folder that has continue new data files added? https://stonesoup.readthedocs.io/en/v1.2/auto_examples/Custom_Pandas_Dataloader.html#dataframe-detection-reader

In the DataFrame Detection Reader section, can the instantiation be changed from dataframe=truth_df to a dataframe generator? dataframe=dataframe_generator. Where the dataframe_generator would read csv files and return the dataframe object.

apiszcz commented 3 months ago

I made another endless loop inside of detections_gen, testing now .. new attribute datapath and new secion is while True until the link for row in. in this case dataframes are stored as pkl, however csv, etc should be fine/slower.

class DataFrameDetectionReader2(DetectionReader, _DataFrameReader):
    from stonesoup.base import Property
    from stonesoup.buffered_generator import BufferedGenerator
    from stonesoup.types.detection import Detection
    """A custom detection reader for DataFrames containing detections.

    DataFrame must have headers with the appropriate fields needed to generate
    the detection. Detections at the same time are yielded together, and such assume file is in
    time order.

    Parameters
    ----------
    """
    # dataframe: pd.DataFrame = Property(doc="DataFrame containing the detection data.")
    datapath: str = Property(doc="Path to dataframes.")

    @BufferedGenerator.generator_method
    def detections_gen(self):
        while True:
            for ipath in sorted([str(p) for p in pathlib.Path(self.datapath).glob('*.pkl')]):
                data_lockfile = ipath.replace('.pkl', '.lock')
                data_lock = FileLock(data_lockfile)
                with data_lock.acquire():
                    with open(ipath, 'rb') as ipkl:
                        data = pickle.load(ipkl)
                    self.dataframe = data

                detections = set()
                previous_time = None

                for row in self.dataframe.to_dict(orient="records"):

                    time = self._get_time(row)
                    if previous_time is not None and previous_time != time:
                        yield previous_time, detections
                        detections = set()
                    previous_time = time

                    detections.add(self.Detection(
                        np.array([[row[col_name]] for col_name in self.state_vector_fields],
                                 dtype=np.float64),
                        timestamp=time,
                        metadata=self._get_metadata(row)))

                # Yield remaining
                yield previous_time, detections
            time.sleep(1)

sdhiscocks commented 3 months ago

More examples are always welcome. It can be challenging with data readers as format and structures of data is very variable, so often need bespoke code for your use case.

dstl / Stone-Soup

Pandas DataFrame Reader, additional example. #971