LorenFrankLab / spyglass

Neuroscience data analysis framework for reproducible research built by Loren Frank Lab at UCSF
https://lorenfranklab.github.io/spyglass/
MIT License
94 stars 42 forks source link

Edited installs impact data provenance #1087

Open CBroz1 opened 2 months ago

CBroz1 commented 2 months ago

A user's edited fork of Spyglass has been used to process data in common_ephys.LFPBand here. This edit impacted the following files, and perhaps others

In the short term, the data should be edited to reflect the pipeline without this edit

In the long term, we need norms that dictate how one can/cannot edit spyglass to maintain confidence in data provenance

rly commented 1 month ago

You could check whether a user is running an editable install of spyglass

from importlib_metadata import Distribution
getattr(Distribution.from_name("spyglass-neuro").origin.dir_info, "editable", False)  # returns True for editable install
getattr(Distribution.from_name("numpy").origin.dir_info, "editable", False)  # returns False for not

and has a dirty git environment:

git status --untracked-files=no --porcelain  # returns empty string or line if no changes besides untracked files

or is not on the master branch

git rev-parse --abbrev-ref HEAD  # returns just the name of the branch

(Or is not on the list commit)

and prevent or warn about making changes to the database if so. (There might be easier ways but those are what I found.)

But as @edeno mentions in https://github.com/LorenFrankLab/spyglass/issues/439, this would not check whether a user has made modifications to dependent packages (e.g., spikeinterface), that would also affect the integrity and provenance of the added data. Still, it might catch the most likely, accidental changes...