dstl / Stone-Soup

A software project to provide the target tracking community with a framework for the development and testing of tracking algorithms.
https://stonesoup.rtfd.io
MIT License
400 stars 131 forks source link

Added an example within the documentation for custom readers supporting pandas DataFrames. #707

Closed BenjaminFraser closed 1 year ago

BenjaminFraser commented 1 year ago

Added a new example (Custom_Pandas_Dataloader.py) within the documentation in docs/examples for the definition of custom Readers that support pandas DataFrames.

This allows a wide range of data formats supported by pandas to be taken advantage of for Ground Truth Readers and Detection Readers, without the need manually define custom data ingestion processes for each type, e.g. JSON, XML, Parquet, HDF5, .txt, .zip.

Given its similarity to the requirements of the custom reader documentation example (#354), I've linked this pull request to that, which hopefully is not a problem.

These classes do have the disadvantage of requiring the entire dataset in memory. However, it seems that the ability to directly use pandas DataFrames is a feature several users of Stonesoup have shown interest in, which is understandable given the flexibility and processing functionalities this can provide.

The example in Custom_Pandas_Dataloader.py includes the definitions of DataFrameGroundTruthReader and DataFrameDetectionReader classes. Each of these inherit from the existing GroundTruthReader class, along with a custom defined _DataFrameReader class.

These classes operate similarly to the existing CSVGroundTruthReader and CSVDetectionReader classes, except they take as input a pandas DataFrame already read into memory, rather than a path to .csv file. They also have modified generator functions for producing the time and paths / detections.

These have been useful for some work I've done using Stonesoup for some UAV-based non-cooperative radar research, and so hopefully they are also of value to other members of the community!

Update on progression and fixes to aspects of this PR, as of 22 Oct 22:

  1. Added pandas to dev of setup.cfg.
  2. Updated references to Stone Soup to be consistent - two words throughout documentation.
  3. Added demonstration of ground truth reader after initialisation by outputting first iteration to docs.
  4. Added support for fields already in DateTime format.
  5. Added pandas_reader.py within reader directory with the three new classes: _DataFrameReader, DataFrameGroundTruthReader, and DataFrameDetectionReader. Tests are still to be developed for these (hence failing on the draft commits currently).
  6. Added tests in test_pandas_reader.py within stonesoup/reader/tests.

A point noted with the tests is that there is currently full coverage of all classes defined in pandas_reader.py, however Codecov flags the pandas import check (which raises an import error if pandas is not installed) as failed.

To-do / enhancements:

  1. Take advantage of pandas grouping to make code more efficient (as suggested by Steven below).
  2. Link documentation example to the classes defined within pandas_reader.py, using something such as inspect.getsource.
sdhiscocks commented 1 year ago

Thanks for the contribution @BenjaminFraser.

I see docs are failing to build due to pandas being missing dependency. If you could add pandas the dev dependencies in setup.py that should resolve it: https://github.com/dstl/Stone-Soup/blob/435883a67045f72161355e9a5cbb44bcacfa67b1/setup.py#L31-L35

It'd be good to have the readers in the main code base (probably with an optional dependency on pandas) so users can easily access them. And also good to keep the example you've created as both a how to use them, but also, in reference to #354, to show how to create custom readers. (Minor issue of if they are modified, we'll have to be sure to update in both places, unless in the example could do something with inspect.getsource)

sdhiscocks commented 1 year ago

(Minor issue of if they are modified, we'll have to be sure to update in both places, unless in the example could do something with inspect.getsource)

Or use of Sphinx literalinclude directive, which can add some syntax highlighting.

BenjaminFraser commented 1 year ago

That's no problem at all, and including the Readers within the main code base sounds like a good idea! The only sticking point was including it with pandas as an optional dependency, but I'll look into that, which should hopefully be straightforward enough.

I'll take a look later when I have the chance and put together another PR for those points!

sdhiscocks commented 1 year ago

The only sticking point was including it with pandas as an optional dependency, but I'll look into that, which should hopefully be straightforward enough.

We've done this before by simply raising an error on importing of dependencies. https://github.com/dstl/Stone-Soup/blob/5276c1b6b541487203806c6fbc8a0547a9ece762/stonesoup/reader/opensky.py#L5-L10

codecov[bot] commented 1 year ago

Codecov Report

Base: 94.81% // Head: 94.84% // Increases project coverage by +0.02% :tada:

Coverage data is based on head (af4c77b) compared to base (f27eaeb). Patch coverage: 97.33% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #707 +/- ## ========================================== + Coverage 94.81% 94.84% +0.02% ========================================== Files 169 170 +1 Lines 8221 8296 +75 Branches 1216 1230 +14 ========================================== + Hits 7795 7868 +73 - Misses 316 318 +2 Partials 110 110 ``` | Flag | Coverage Δ | | |---|---|---| | integration | `68.50% <0.00%> (-0.63%)` | :arrow_down: | | unittests | `92.69% <97.33%> (+0.04%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dstl#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/dstl/Stone-Soup/pull/707?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dstl) | Coverage Δ | | |---|---|---| | [stonesoup/reader/pandas\_reader.py](https://codecov.io/gh/dstl/Stone-Soup/pull/707/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dstl#diff-c3RvbmVzb3VwL3JlYWRlci9wYW5kYXNfcmVhZGVyLnB5) | `97.33% <97.33%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dstl). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dstl)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.