Frequency-Tagging / freq-tag

A Python package facilitating the analysis of frequency-tagging data
Other
3 stars 0 forks source link

convert mne epochs to psd #4

Closed kalenkovich closed 3 years ago

kalenkovich commented 3 years ago

I copied the frequency tagging tutorial from mne and moved the power-calculating code to freqtag.spectrum.Spectrum.psd_from_mne_epochs. I then rewrote the code so that it does not use mne but uses numpy's fft instead.

pep8speaks commented 3 years ago

Hello @kalenkovich! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2021-07-20 15:05:33 UTC
dominikwelke commented 3 years ago

hi @kalenkovich , finally came to look into the code :)

some points/questions:

so far for now, i think these are some initial things to discuss :) happy to hear your thoughts and opinions on these and open to discuss!

kalenkovich commented 3 years ago

Thank you for taking a look, @dominikwelke!

  • the Spectrum class can hold only one kind (psd, snr, ...) of spectrum (surprise!). i suppose we will then need another higher order class to combine several instances of spectrum based on the same data (e.g. psd and snr derived from the psd), and maybe even the raw time-series the psd is based on.. (i.e. an object of a time-series class or similar). what is your opinion on that, did you already scratch a class-structure for this?

I think we can deal with it once the need arises. Right now, I don't see the necessity for such a class. I do, however, see problems that can arise from different spectra getting out of sync. We can safeguard against this probably but I would rather get into it when and if we need it for some use. At this point, I would leave it at that.

As a future step, I would think it would be great to have a tutorial with a full-fledged analysis where we would see whether combining spectra under a single object would benefit clarity or muddy it. Now that we are doing are own thing we have more freedom than with the mne-python tutorial.

  • maybe we want the kind variable to indicate whether its epoched data or continuous/single-epoch data? an alternative would be to have 2 types of spectre classes, but i think this would be unnecessary overkill

I don't think this is necessary at all. Continuous data = single-epoch data. Again, I don't mind if we add it later but later - once we see that it helps the user/coder.

  • relatedly, there would be a similar method/function called psd_from_mne_raw or the like..

That makes sense. I think this function should evolve into something more general like psd_from_mne or even psd_from or even just from where the input could be inferred. I am not sure how often one uses frequency-tagging based on raw data, But I would love to take a look at at least one particular example: the vases-phases experiment.

  • i am not sure if i am really convinced yet that psd_from_mne... should be a class method. it seems to me that this could be a bit confusing, given that you can create the object also providing the data as arrays via the __init__ method. calling the class method ...from_mne.. would then simply override existing data. while this doesnt need to be a general problem (if we let it throw a warning), i think conceptually class methods should rather be tools that modify or work with data already attached to the class object (as provided via the __init__ method). a class method can also add data, of course, but i think this shouldnt be crucial data that have to be there for the class to make sense - e.g. a higher order class potentially holding multiple spectre as mentioned above might well have an add_spectrum method that can attach unlimited additional spectre, but the first piece of spectrum/data of whatever kind should imo be provided via __init__ (or some standalone function that returns a class-object of this type). coming back to psd_from_mne_epochs: an alternative in line with this would be to have the now-class-method as a standalone function (e.g. at freq-tag.spectrum.psd_from_whatever level) that returns an object of the class freq-tag.spectrum.Spectrum

I think we have a different understanding of what a class function is. In Python, such a function is not bound to an instance and it can't change the inner data of an object even if you call it from an instance and not from the class. That is why the first argument is traditionally referred to as cls and refers to the class and not to an instance as self does in instance methods definition. It does exactly what you propose for freq-tag.spectrum.psd_from_whatever to do: it returns a new Spectrum object.

This is very similar to how, for example, pandas.DataFrame.from_dict() works. You can create a dataframe using df = DataFrame(...) implicitly calling pandas.DataFrame.__init__(self, ...) or you can call the class method from_dict with df = pandas.DataFrame.from_dict(...) (the arguments would differ, of course.

We could, of course, move this function up to the module level but I am pretty sure the current way is the clearest one.

  • how would we want to provide the transformation functions like snr_spectrum(...) from the tutorial in our module? these could be class methods of the Spectrum class, but on the other hand they really depend on the kind variable (i.e. computing snr only makes sense if kind=='psd'). maybe these should rather be classmethods of the higher order class? or individual function again? or both (individual function + a class method wrapper calling the individual function)?

I would use an instance method called something like calculate_spectrum. For now, I would have it simply return an np,ndarray object, But, of course, we will soon get into a situation where it would make more sense for it to return an object of class Spectrum (or, maybe something else).

I certainly don't think this should be a class method. I think we are probably using different terminologies.

As for the module-level method, I think it only makes logical sense if we reuse such a function in different classes. For now, I think its best placement is as an instance method.

Let's get back to this in a separate issue.

dominikwelke commented 3 years ago

I think we can deal with it once the need arises. Right now, I don't see the necessity for such a class. I do, however, see problems that can arise from different spectra getting out of sync. We can safeguard against this probably but I would rather get into it when and if we need it for some use. At this point, I would leave it at that.

ok

As a future step, I would think it would be great to have a tutorial with a full-fledged analysis where we would see whether combining spectra under a single object would benefit clarity or muddy it

yes, sounds good

I don't think this is necessary at all. Continuous data = single-epoch data. Again, I don't mind if we add it later but later - once we see that it helps the user/coder.

you got me wrong. sure, continuous = single epoch = 2d data, but multi epoch data is 3d and some methods have to know about this. of course this can all be handled implicitly by the method checking dimensionality of the data, but maybe its easier to know about it, because a 3rd dimension might potentially also come from somewhere else (e.g. multiple subject continuous data). but sure, we can also take care of it once such a problem arises..

I think we have a different understanding of what a class function is.

apparently :) i refer to any method attributed to a class that can access and modify 'self' as a class method (as opposed to static methods, that dont access self). but i guess you're more right here, i should read some about this.

This is very similar to how, for example, pandas.DataFrame.from_dict() works. You can create a dataframe using df = DataFrame(...) implicitly calling pandas.DataFrame.init(self, ...) or you can call the class method from_dict with df = pandas.DataFrame.from_dict(...) (the arguments would differ, of course.

yes, and this i personally find this API suboptimal and quite confusing. note that for csv data pandas does it differently: one should call pandas.read_csv(), pandas.DataFrame.from_csv() was deprecated.. this way the (less experienced) enduser doesnt need to know, how the class is actually called internally, which they probably dont care about :) likewise, imagine working with numpy and instantiating array objects as numpy.ndarray.from_list().

not sure there is right or wrong, it's just our design decision.. in terms of API for the end user it might really make no difference, whether a module level function like freq-tag.spectrum.psd_from_whatever() would be a wrapper that calls the correct class method based on input datatype, or whether the code actually sits in this function.

I would use an instance method called something like calculate_spectrum. For now, I would have it simply return an np,ndarray object, But, of course, we will soon get into a situation where it would make more sense for it to return an object of class Spectrum (or, maybe something else).

yes, sounds good. sorry again for the mixed up terminology :)

dominikwelke commented 3 years ago

@kalenkovich sorry for the late reply. mauscript deadlines..

kalenkovich commented 3 years ago

I don't think this is necessary at all. Continuous data = single-epoch data. Again, I don't mind if we add it later but later - once we see that it helps the user/coder.

you got me wrong. sure, continuous = single epoch = 2d data, but multi epoch data is 3d and some methods have to know about this. of course this can all be handled implicitly by the method checking dimensionality of the data, but maybe its easier to know about it, because a 3rd dimension might potentially also come from somewhere else (e.g. multiple subject continuous data). but sure, we can also take care of it once such a problem arises..

I think multi-subject continuous data is unlikely to fit into one array because that would necessitate having the number of samples for each subject. On the other hand, I am all for having explicit information about the axes/dimensions. I would, however, prefer to keep this information as instance attributes instead of type. So, something like spectrum.dimnames = ['epochs', 'sensor', 'frequency']. I think we should probably use xarray for that. This will allow us to stuff like spectrum.data.mean(dim='epochs'). What do you think?

Of course, none of the above means that we won't need a special class for spectra derived from continuous data - we might. But for now, the only difference would be the class name so I think this can wait.

I think we have a different understanding of what a class function is.

apparently :) i refer to any method attributed to a class that can access and modify 'self' as a class method (as opposed to static methods, that dont access self). but i guess you're more right here, i should read some about this.

Oh, yeah! This is probably different in different programming languages. In Python, class methods are almost the same as static methods except that they can refer to the class itself without using its name, This way, if A.do is a class method that creates an instance of A and B is a subclass of A, then B.do will create an instance of B. And what you described is called an instance method.

This is very similar to how, for example, pandas.DataFrame.from_dict() works. You can create a dataframe using df = DataFrame(...) implicitly calling pandas.DataFrame.init(self, ...) or you can call the class method from_dict with df = pandas.DataFrame.from_dict(...) (the arguments would differ, of course.

yes, and this i personally find this API suboptimal and quite confusing. note that for csv data pandas does it differently: one should call pandas.read_csv(), pandas.DataFrame.from_csv() was deprecated.. this way the (less experienced) enduser doesnt need to know, how the class is actually called internally, which they probably dont care about :) likewise, imagine working with numpy and instantiating array objects as numpy.ndarray.from_list().

not sure there is right or wrong, it's just our design decision.. in terms of API for the end user it might really make no difference, whether a module level function like freq-tag.spectrum.psd_from_whatever() would be a wrapper that calls the correct class method based on input datatype, or whether the code actually sits in this function.

Oh, alright! Honestly, to me, numpy.ndarray.from_list() makes perfect sense. It even reads well: "give me an array from list". And once you have multiple classes, you need to call these functions something like ndarray_from_list anyway. pd.read_csv can return both a data frame and a series and pd.DataFrame.from_csv has a few defaults that are different from read_csv. I think that was confusing, not just using a class method which is why from_dict is not being deprecated.

That being said, I see your point that it might be confusing to import a class explicitly. I've seen people working with Python who don't quite know what a class is. I mean, they do in abstract, but having a class object like Spectrum in the code would confuse them. I'll move the function to the module level.

kalenkovich commented 3 years ago

I'll move the function to the module level.

Done!

dominikwelke commented 3 years ago

lgtm and finishes without errors..

On the other hand, I am all for having explicit information about the axes/dimensions. I would, however, prefer to keep this information as instance attributes instead of type. So, something like spectrum.dimnames = ['epochs', 'sensor', 'frequency'].

sounds good. should we do it here or in another PR?

I think we should probably use xarray for that. This will allow us to stuff like spectrum.data.mean(dim='epochs'). What do you think?

i didnt use xarray before.. sounds convenient, but it comes with additional dependencies: xarray itself, pandas(, setuptools) and further optional ones if we want to use some functionalities. plus it only works with python 3.7+.

not sure if any of these would be a major problem, but if it's just a minimal advantage over plain ndarrays we might pass.. is there more than the labels that we could benefit from, down the line?

kalenkovich commented 3 years ago

lgtm and finishes without errors..

On the other hand, I am all for having explicit information about the axes/dimensions. I would, however, prefer to keep this information as instance attributes instead of type. So, something like spectrum.dimnames = ['epochs', 'sensor', 'frequency'].

sounds good. should we do it here or in another PR?

Let's do it separately, I'll create an issue so that we don't forget about it.

I think we should probably use xarray for that. This will allow us to stuff like spectrum.data.mean(dim='epochs'). What do you think?

i didnt use xarray before.. sounds convenient, but it comes with additional dependencies: xarray itself, pandas(, setuptools) and further optional ones if we want to use some functionalities. plus it only works with python 3.7+.

not sure if any of these would be a major problem, but if it's just a minimal advantage over plain ndarrays we might pass.. is there more than the labels that we could benefit from, down the line?

I think you are right, there is not much benefit at this point. setuptools is almost always installed anyways but 3.7+ and pandas are certainly not necessary now. I don't really mind having these dependencies in the future if we see a clear benefit but it can wait once we do. Let's forget about xarray for the moment and handle the dimensions ourselves.

kalenkovich commented 3 years ago

Should I merge then?

dominikwelke commented 3 years ago

Yes, go ahead :)

Am 28. Juli 2021, 17:56, um 17:56, Zhenya @.***> schrieb:

Should I merge then?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/Frequency-Tagging/freq-tag/pull/4#issuecomment-888425528