Casting Fex DataFrame does not correctly assign default Fex metadata

YounZGrace commented 4 months ago

When we run detections, the Detector class automatically fills out all of the relevant Fex meta data. However, when we cast at pandas dataframe as a Fex object, this needs to be manually passed as a keyword. Unfortunately, as a consequence a lot of the internal Fex functions don't work correctly such as compute_identities().

 def __init__(self, *args, **kwargs):
        ### Columns ###
        self.au_columns = kwargs.pop("au_columns", None)
        self.emotion_columns = kwargs.pop("emotion_columns", None)
        self.facebox_columns = kwargs.pop("facebox_columns", None)
        self.landmark_columns = kwargs.pop("landmark_columns", None)
        self.facepose_columns = kwargs.pop("facepose_columns", None)
        self.identity_columns = kwargs.pop("identity_columns", None)
        self.gaze_columns = kwargs.pop("gaze_columns", None)
        self.time_columns = kwargs.pop("time_columns", None)
        self.design_columns = kwargs.pop("design_columns", None)

        ### Meta data ###
        self.filename = kwargs.pop("filename", None)
        self.sampling_freq = kwargs.pop("sampling_freq", None)
        self.detector = kwargs.pop("detector", None)
        self.face_model = kwargs.pop("face_model", None)
        self.landmark_model = kwargs.pop("landmark_model", None)
        self.au_model = kwargs.pop("au_model", None)
        self.emotion_model = kwargs.pop("emotion_model", None)
        self.facepose_model = kwargs.pop("facepose_model", None)
        self.identity_model = kwargs.pop("identity_model", None)
        self.features = kwargs.pop("features", None)
        self.sessions = kwargs.pop("sessions", None)

        self.verbose = kwargs.pop("verbose", False)

@ejolly I think we should be trying to automatically figure this out if the columns are not empty or don't exist.

Grace & Luke

ljchang commented 4 months ago

@ejolly , I started looking into this and I think this will require a discussion before I start writing code.

Basically, in the first version of py-feat (before we added the detection module), the idea was to have a standard to import all of the different software packages. This resulted in separate ways to load data files from openface, facet, etc., which is why we force users to pass in different column names to initialize a fex object.

Now that we've added the detector module and have started to move away from our initial idea of importing data from different packages, I think we could be a bit more strict in initializing a Fex object, which would make this a lot easier and also more similar to our other data representations in nltools (Brain_Data, Adjacency, etc).

I'm leaning towards the latter option, but wanted your input before I proceed.

ejolly commented 4 months ago

@ljchang Can you give me a little more context for what situation you're trying to caste a data-frame to a Fex object (as opposed to just using the Fex created by a Detector)?

I'm inclined to agree with your take about being stricter given the changes to py-feat. But at the same time this strikes me as a "code-smell" i.e. an anti-pattern that we shouldn't be letting users get into in the first place. Can we make a change to avoid giving people the intuition, that converting a data-frame to a Fex object is good practice?

ljchang commented 4 months ago

Well the main use case, which is very common is loading a file of previously dectected images/movies. Currently, it does not automatically import all of the Fex properties by default.

There is an undocumented read_feat() method on a Fex object, which is clunky, plus it doesn't work correctly.

ejolly commented 4 months ago

Gotcha. One thought is that we do make Fex creation stricter, and also bring back a stand-alone load_fex function that is basically a "wrapper" around pd.read_csv. It can handle the responsibility of trying to intuit the correct column meta-data from the file return a correctly constructed Fex for you.

Were you thinking something like that? Or more object-oriented, where you first initialize an empty Fex instance and then call its load method?

ljchang commented 4 months ago

yeah, that's exactly what i was thinking, to make it more analogous to pd.read_csv. Another idea is to pass a file name to Fex constructor like we do with Brain_Data to automatically load it, which could just call the new load_fex method.

ljchang commented 4 months ago

@ejolly, one more thing that we should think about before finishing this refactor is how should we store Fex metadata following detections when we write out a file? Obviously, we can't store this in a straightforward way using CSV, unless we add an additional header and parse it when we load the data. Alternatively, we could use HDF5, like we do for nltools, or even JSON.

I definitely think we should figure this out sooner than later as it will also be necessary with our pyfeat-live plans. I've been looking at other packages like geopandas for inspiration, but they write to a more standard GIS format. What did we do for Design_Matrix in nltools?

ljchang commented 4 months ago

ok, I decided to go with writing out a csv file where there is json as an additional header containing all of the metadata. The new read_fex function loads this file format as a fex with all of the metadata populated. I think it is pretty clean and will work for most use cases.

ejolly commented 4 months ago

Ok that seems reasonable, I'll check out your PR this week. We never really saved any additional meta-data with design-matrices and on construction the user has to manually set the metadata, i.e.

design_mat = Design_matrix(pd.read_csv('data.csv'), sampling_freq=TR, convolved=['list_of_col_names'], polys=['list_of_col_names'])

This is/was fine because 99% of the time when a user is loading a previously generated design matrix, they're loading the csv, setting it to Brain_Data.X and calling .regress(). The meta-data is mostly useful for manipulating multiple design matrices together, which produces the csv the user is loading in the first place and thus isn't really needed anymore. We haven't had a reason yet to support the other 1% of use cases.

cosanlab / py-feat

Casting Fex DataFrame does not correctly assign default Fex metadata #212