Closed kalenkovich closed 3 years ago
Hello @kalenkovich! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
hi @kalenkovich , finally came to look into the code :)
some points/questions:
Spectrum
class can hold only one kind (psd, snr, ...) of spectrum (surprise!). i suppose we will then need another higher order class to combine several instances of spectrum
based on the same data (e.g. psd and snr derived from the psd), and maybe even the raw time-series the psd is based on.. (i.e. an object of a time-series
class or similar). what is your opinion on that, did you already scratch a class-structure for this?kind
variable to indicate whether its epoched data or continuous/single-epoch data? an alternative would be to have 2 types of spectre classes, but i think this would be unnecessary overkillpsd_from_mne_raw
or the like..psd_from_mne...
should be a class method.
it seems to me that this could be a bit confusing, given that you can create the object also providing the data as arrays via the __init__
method. calling the class method ...from_mne..
would then simply override existing data. while this doesnt need to be a general problem (if we let it throw a warning), i think conceptually class methods should rather be tools that modify or work with data already attached to the class object (as provided via the __init__
method). a class method can also add data, of course, but i think this shouldnt be crucial data that have to be there for the class to make sense - e.g. a higher order class potentially holding multiple spectre as mentioned above might well have an add_spectrum
method that can attach unlimited additional spectre, but the first piece of spectrum/data of whatever kind should imo be provided via __init__
(or some standalone function that returns a class-object of this type).
coming back to psd_from_mne_epochs
: an alternative in line with this would be to have the now-class-method as a standalone function (e.g. at freq-tag.spectrum.psd_from_whatever
level) that returns an object of the class freq-tag.spectrum.Spectrum
snr_spectrum(...)
from the tutorial in our module? these could be class methods of the Spectrum
class, but on the other hand they really depend on the kind
variable (i.e. computing snr only makes sense if kind=='psd'
). maybe these should rather be classmethods of the higher order class? or individual function again? or both (individual function + a class method wrapper calling the individual function)?so far for now, i think these are some initial things to discuss :) happy to hear your thoughts and opinions on these and open to discuss!
Thank you for taking a look, @dominikwelke!
- the
Spectrum
class can hold only one kind (psd, snr, ...) of spectrum (surprise!). i suppose we will then need another higher order class to combine several instances ofspectrum
based on the same data (e.g. psd and snr derived from the psd), and maybe even the raw time-series the psd is based on.. (i.e. an object of atime-series
class or similar). what is your opinion on that, did you already scratch a class-structure for this?
I think we can deal with it once the need arises. Right now, I don't see the necessity for such a class. I do, however, see problems that can arise from different spectra getting out of sync. We can safeguard against this probably but I would rather get into it when and if we need it for some use. At this point, I would leave it at that.
As a future step, I would think it would be great to have a tutorial with a full-fledged analysis where we would see whether combining spectra under a single object would benefit clarity or muddy it. Now that we are doing are own thing we have more freedom than with the mne-python tutorial.
- maybe we want the
kind
variable to indicate whether its epoched data or continuous/single-epoch data? an alternative would be to have 2 types of spectre classes, but i think this would be unnecessary overkill
I don't think this is necessary at all. Continuous data = single-epoch data. Again, I don't mind if we add it later but later - once we see that it helps the user/coder.
- relatedly, there would be a similar method/function called
psd_from_mne_raw
or the like..
That makes sense. I think this function should evolve into something more general like psd_from_mne
or even psd_from
or even just from
where the input could be inferred. I am not sure how often one uses frequency-tagging based on raw data, But I would love to take a look at at least one particular example: the vases-phases experiment.
- i am not sure if i am really convinced yet that
psd_from_mne...
should be a class method. it seems to me that this could be a bit confusing, given that you can create the object also providing the data as arrays via the__init__
method. calling the class method...from_mne..
would then simply override existing data. while this doesnt need to be a general problem (if we let it throw a warning), i think conceptually class methods should rather be tools that modify or work with data already attached to the class object (as provided via the__init__
method). a class method can also add data, of course, but i think this shouldnt be crucial data that have to be there for the class to make sense - e.g. a higher order class potentially holding multiple spectre as mentioned above might well have anadd_spectrum
method that can attach unlimited additional spectre, but the first piece of spectrum/data of whatever kind should imo be provided via__init__
(or some standalone function that returns a class-object of this type). coming back topsd_from_mne_epochs
: an alternative in line with this would be to have the now-class-method as a standalone function (e.g. atfreq-tag.spectrum.psd_from_whatever
level) that returns an object of the classfreq-tag.spectrum.Spectrum
I think we have a different understanding of what a class function is. In Python, such a function is not bound to an instance and it can't change the inner data of an object even if you call it from an instance and not from the class. That is why the first argument is traditionally referred to as cls
and refers to the class and not to an instance as self
does in instance methods definition. It does exactly what you propose for freq-tag.spectrum.psd_from_whatever
to do: it returns a new Spectrum
object.
This is very similar to how, for example, pandas.DataFrame.from_dict()
works. You can create a dataframe using df = DataFrame(...)
implicitly calling pandas.DataFrame.__init__(self, ...)
or you can call the class method from_dict
with df = pandas.DataFrame.from_dict(...)
(the arguments would differ, of course.
We could, of course, move this function up to the module level but I am pretty sure the current way is the clearest one.
- how would we want to provide the transformation functions like
snr_spectrum(...)
from the tutorial in our module? these could be class methods of theSpectrum
class, but on the other hand they really depend on thekind
variable (i.e. computing snr only makes sense ifkind=='psd'
). maybe these should rather be classmethods of the higher order class? or individual function again? or both (individual function + a class method wrapper calling the individual function)?
I would use an instance method called something like calculate_spectrum
. For now, I would have it simply return an np,ndarray
object, But, of course, we will soon get into a situation where it would make more sense for it to return an object of class Spectrum
(or, maybe something else).
I certainly don't think this should be a class method. I think we are probably using different terminologies.
As for the module-level method, I think it only makes logical sense if we reuse such a function in different classes. For now, I think its best placement is as an instance method.
Let's get back to this in a separate issue.
I think we can deal with it once the need arises. Right now, I don't see the necessity for such a class. I do, however, see problems that can arise from different spectra getting out of sync. We can safeguard against this probably but I would rather get into it when and if we need it for some use. At this point, I would leave it at that.
ok
As a future step, I would think it would be great to have a tutorial with a full-fledged analysis where we would see whether combining spectra under a single object would benefit clarity or muddy it
yes, sounds good
I don't think this is necessary at all. Continuous data = single-epoch data. Again, I don't mind if we add it later but later - once we see that it helps the user/coder.
you got me wrong. sure, continuous = single epoch = 2d data, but multi epoch data is 3d and some methods have to know about this. of course this can all be handled implicitly by the method checking dimensionality of the data, but maybe its easier to know about it, because a 3rd dimension might potentially also come from somewhere else (e.g. multiple subject continuous data). but sure, we can also take care of it once such a problem arises..
I think we have a different understanding of what a class function is.
apparently :) i refer to any method attributed to a class that can access and modify 'self' as a class method (as opposed to static methods, that dont access self). but i guess you're more right here, i should read some about this.
This is very similar to how, for example, pandas.DataFrame.from_dict() works. You can create a dataframe using df = DataFrame(...) implicitly calling pandas.DataFrame.init(self, ...) or you can call the class method from_dict with df = pandas.DataFrame.from_dict(...) (the arguments would differ, of course.
yes, and this i personally find this API suboptimal and quite confusing.
note that for csv data pandas does it differently: one should call pandas.read_csv()
, pandas.DataFrame.from_csv()
was deprecated..
this way the (less experienced) enduser doesnt need to know, how the class is actually called internally, which they probably dont care about :)
likewise, imagine working with numpy and instantiating array objects as numpy.ndarray.from_list()
.
not sure there is right or wrong, it's just our design decision..
in terms of API for the end user it might really make no difference, whether a module level function like freq-tag.spectrum.psd_from_whatever()
would be a wrapper that calls the correct class method based on input datatype, or whether the code actually sits in this function.
I would use an instance method called something like calculate_spectrum. For now, I would have it simply return an np,ndarray object, But, of course, we will soon get into a situation where it would make more sense for it to return an object of class Spectrum (or, maybe something else).
yes, sounds good. sorry again for the mixed up terminology :)
@kalenkovich sorry for the late reply. mauscript deadlines..
I don't think this is necessary at all. Continuous data = single-epoch data. Again, I don't mind if we add it later but later - once we see that it helps the user/coder.
you got me wrong. sure, continuous = single epoch = 2d data, but multi epoch data is 3d and some methods have to know about this. of course this can all be handled implicitly by the method checking dimensionality of the data, but maybe its easier to know about it, because a 3rd dimension might potentially also come from somewhere else (e.g. multiple subject continuous data). but sure, we can also take care of it once such a problem arises..
I think multi-subject continuous data is unlikely to fit into one array because that would necessitate having the number of samples for each subject. On the other hand, I am all for having explicit information about the axes/dimensions. I would, however, prefer to keep this information as instance attributes instead of type. So, something like spectrum.dimnames = ['epochs', 'sensor', 'frequency']
. I think we should probably use xarray
for that. This will allow us to stuff like spectrum.data.mean(dim='epochs')
. What do you think?
Of course, none of the above means that we won't need a special class for spectra derived from continuous data - we might. But for now, the only difference would be the class name so I think this can wait.
I think we have a different understanding of what a class function is.
apparently :) i refer to any method attributed to a class that can access and modify 'self' as a class method (as opposed to static methods, that dont access self). but i guess you're more right here, i should read some about this.
Oh, yeah! This is probably different in different programming languages. In Python, class methods are almost the same as static methods except that they can refer to the class itself without using its name, This way, if A.do is a class method that creates an instance of A and B is a subclass of A, then B.do will create an instance of B. And what you described is called an instance method.
This is very similar to how, for example, pandas.DataFrame.from_dict() works. You can create a dataframe using df = DataFrame(...) implicitly calling pandas.DataFrame.init(self, ...) or you can call the class method from_dict with df = pandas.DataFrame.from_dict(...) (the arguments would differ, of course.
yes, and this i personally find this API suboptimal and quite confusing. note that for csv data pandas does it differently: one should call
pandas.read_csv()
,pandas.DataFrame.from_csv()
was deprecated.. this way the (less experienced) enduser doesnt need to know, how the class is actually called internally, which they probably dont care about :) likewise, imagine working with numpy and instantiating array objects asnumpy.ndarray.from_list()
.not sure there is right or wrong, it's just our design decision.. in terms of API for the end user it might really make no difference, whether a module level function like
freq-tag.spectrum.psd_from_whatever()
would be a wrapper that calls the correct class method based on input datatype, or whether the code actually sits in this function.
Oh, alright! Honestly, to me, numpy.ndarray.from_list()
makes perfect sense. It even reads well: "give me an array from list". And once you have multiple classes, you need to call these functions something like ndarray_from_list
anyway. pd.read_csv
can return both a data frame and a series and pd.DataFrame.from_csv
has a few defaults that are different from read_csv
. I think that was confusing, not just using a class method which is why from_dict
is not being deprecated.
That being said, I see your point that it might be confusing to import a class explicitly. I've seen people working with Python who don't quite know what a class is. I mean, they do in abstract, but having a class object like Spectrum
in the code would confuse them. I'll move the function to the module level.
I'll move the function to the module level.
Done!
lgtm and finishes without errors..
On the other hand, I am all for having explicit information about the axes/dimensions. I would, however, prefer to keep this information as instance attributes instead of type. So, something like
spectrum.dimnames = ['epochs', 'sensor', 'frequency']
.
sounds good. should we do it here or in another PR?
I think we should probably use
xarray
for that. This will allow us to stuff likespectrum.data.mean(dim='epochs')
. What do you think?
i didnt use xarray before.. sounds convenient, but it comes with additional dependencies: xarray itself, pandas(, setuptools) and further optional ones if we want to use some functionalities. plus it only works with python 3.7+.
not sure if any of these would be a major problem, but if it's just a minimal advantage over plain ndarrays we might pass.. is there more than the labels that we could benefit from, down the line?
lgtm and finishes without errors..
On the other hand, I am all for having explicit information about the axes/dimensions. I would, however, prefer to keep this information as instance attributes instead of type. So, something like
spectrum.dimnames = ['epochs', 'sensor', 'frequency']
.sounds good. should we do it here or in another PR?
Let's do it separately, I'll create an issue so that we don't forget about it.
I think we should probably use
xarray
for that. This will allow us to stuff likespectrum.data.mean(dim='epochs')
. What do you think?i didnt use xarray before.. sounds convenient, but it comes with additional dependencies: xarray itself, pandas(, setuptools) and further optional ones if we want to use some functionalities. plus it only works with python 3.7+.
not sure if any of these would be a major problem, but if it's just a minimal advantage over plain ndarrays we might pass.. is there more than the labels that we could benefit from, down the line?
I think you are right, there is not much benefit at this point. setuptools
is almost always installed anyways but 3.7+ and pandas are certainly not necessary now. I don't really mind having these dependencies in the future if we see a clear benefit but it can wait once we do. Let's forget about xarray for the moment and handle the dimensions ourselves.
Should I merge then?
Yes, go ahead :)
Am 28. Juli 2021, 17:56, um 17:56, Zhenya @.***> schrieb:
Should I merge then?
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/Frequency-Tagging/freq-tag/pull/4#issuecomment-888425528
I copied the frequency tagging tutorial from
mne
and moved the power-calculating code tofreqtag.spectrum.Spectrum.psd_from_mne_epochs
. I then rewrote the code so that it does not usemne
but usesnumpy
'sfft
instead.