childmindresearch / wristpy

https://childmindresearch.github.io/wristpy/
GNU Lesser General Public License v2.1
2 stars 1 forks source link

Task: Reorganize readers #17

Closed ReinderVosDeWael closed 3 months ago

ReinderVosDeWael commented 4 months ago

Description

The readers need to be rewritten to account for the WatchData class changes and there should be a convenience reader function that selects the correct function based on file extension.

These should all be placed inside wristpy/io/readers.py.

Tasks

Freeform Notes

No response

nx10 commented 3 months ago

Add a read_watch_data that takes the same input and output and selects the correct reader based on file extension or raises and error for unknown file extensions.

This will work for now but not forever - Many watches use .bin as the extension.

ReinderVosDeWael commented 3 months ago

Of course they do... @Asanto32 perhaps we want to omit the wrapper read function

nx10 commented 3 months ago

I can offer adding one to actfast that reads a few bytes and returns the type

Asanto32 commented 3 months ago

Ahh, I already started with something simple as follows, but I can remove it.

def read_watch_data(file_name: pathlib.Path | str) -> WatchData:
    """Read watch data from a file.

    This function selects the correct loader based on the file extension.
    Returns error if none of the above.

    Args:
        file_name: The filename to read the watch data from.

    Returns:
        input_data: The raw sensor data.
    """
    filename = pathlib.Path(file_name)
    if filename.suffix == ".gt3x":
        input_data = gt3x_loader(filename.as_posix())
    elif filename.suffix == ".bin":
        input_data = geneActiv_loader(filename.as_posix())
    else:
        raise ValueError(f"Unsupported file extension: {filename.suffix}")

    return input_data
Asanto32 commented 3 months ago

Also, this is what the reader for GGIR looks like, it seems the other watch with .bin is movisens?

https://github.com/wadpac/GGIR/blob/master/R/g.readaccfile.R

Asanto32 commented 3 months ago

@ReinderVosDeWael @nx10 if I'm not mistaken the actfast readers only work with str. So if I use the wrapper above in read_watch_data, we can pass a pathlib.path or string directly. This goes against Task 1 and 2, but captures the idea by using the wrapper. Let me know your thoughts.

nx10 commented 3 months ago

Yeah so far they only work with string, but I will probably make them compatible with arrows filesystem abstraction then you can directly stream from stuff like S3

Either way just use

actfast.read_x_y(str(path))

for now.

nx10 commented 3 months ago

I'm also open to move whatever wrapper you come up with to actfast eventually.

PyO3 currently can't generate stubs (that's why actfast has no in-editor auto complete) so it would make sense to also ship a thin python layer.

ReinderVosDeWael commented 3 months ago

Yeah I concur with Florian here; you can take in a pathlib.Path | str and convert it with str() where needed. This means that users of this function (including power end-users) can use all the goodies of pathlib without being bothered by the conversion themselves.

nx10 commented 3 months ago

Also, this is what the reader for GGIR looks like, it seems the other watch with .bin is movisens?

https://github.com/wadpac/GGIR/blob/master/R/g.readaccfile.R

Just for reference from the GGIR docs:

2.2 Prepare folder structure GGIR works with the following accelerometer brands and formats: GENEActiv .bin Axivity AX3 and AX6 .cwa ActiGraph .csv and .gt3x (.gt3x only the newer format generated with firmware versions above 2.5.0. Serial numbers that start with “NEO” or “MRA” and have firmware version of 2.5.0 or earlier use an older format of the .gt3x file). Note for Actigraph users: If you want to work with .csv exports via the commercial ActiLife software then note that you have the option to export data with timestamps. Please do not do this as this causes memory issues for GGIR. To cope with the absence of timestamps GGIR will calculate timestamps from the sample frequency, the start time and start date as presented in the file header. Movisens .bin files with data stored in folders. GGIR expects that each participant’s folder contains at least a file named acc.bin. Any other accelerometer brand that generates csv output, see documentation for functions read.myacc.csv and argument rmc.noise in the GGIR function documentation (pdf). Note that functionality for the following file formats was part of GGIR but has been deprecated as it required a significant maintenance effort without a clear use case or community support: (1) .bin for the Genea monitor by Unilever Discover, an accelerometer that was used for some studies between 2007 and 2012) .bin, and (2) .wav files as can be exported by the Axivity Ltd OMGUI software. Please contact us if you think these data formats should be facilitated by GGIR again and if you are interested in supporting their ongoing maintenance. All accelerometer data that needs to be analysed should be stored in one folder, or subfolders of that folder. Give the folder an appropriate name, preferable with a reference to the study or project it is related to rather than just ‘data’, because the name of this folder will be used later on as an identifier of the dataset.

( https://cran.r-project.org/web/packages/GGIR/vignettes/GGIR.html )

So it seems that GENEActiv, Movisens, Genea, and Axivity all use the .bin file extension for their different formats. Additionally ActiGraph .gt3x files are of course also just zip archives that contain a .bin file.