alexlib / pivpy

PIVPy helps to understand PIV results, inspired by PIVMAT
http://pivpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
10 stars 8 forks source link

Names of vorticity, tke, Reynolds stress in `piv` class and more columns in `io.load_openpiv_txt()`. #63

Open nepomnyi opened 3 weeks ago

nepomnyi commented 3 weeks ago

Hello, Dr. Liberzon @alexlib. I noticed two things. 1) In the methods vorticity(), strain(), divergence(), acceleration(), kinetic_energy(), tke(), reynolds_stress() and rms(), you create new respective DataArrays within the Dataset. But you give them all one name: w. In the attributes, you call them differently, that's true. But the names of the DataArrays themselves are all the same. It was a bit of inconvenience to me, when I wanted to have them all in one Dataset. I am wondering if you did it on purpose. If not, do you mind if I correct it and submit a pull request? 2) In the function io.load_openpiv_txt(), on the line where we load the .txt file d = np.genfromtxt(filename, usecols=(0,1,2,3,4)), you load only first 5 columns of the OpenPIV .txt file which are x, y, u, v, flags. In my work, I also need the last column masks which you don't load. I need it to mask out bubble velocities. So again, I am wondering if don't load the masks column on purpose. If not, do you mind if I correct it and submit a pull request? Thanks. Ivan

alexlib commented 3 weeks ago

Hi @nepomnyi - this was designed in purpose so the xarray is not growing in memory as we add more and more scalars. This is similar to the behavior of PIVMAT in this perspective. I'd like to hear about your idea how to make it better. My suggestion is to create a copy of the array with different [w] if you need your data stored. There is no problem to redesign the PIVPy for better usability, but we need to be careful not to overflow the memory.

alexlib commented 3 weeks ago

You're absolutely right, it's a bug in load_txt that is related to the fact that masks appeared later in time. I'm not sure though what's better, maybe to keep an older version and create a newer version as an optional - load 5 or 6 columns. What do you think? @nepomnyi

nepomnyi commented 3 weeks ago

You're absolutely right, it's a bug in load_txt that is related to the fact that masks appeared later in time. I'm not sure though what's better, maybe to keep an older version and create a newer version as an optional - load 5 or 6 columns. What do you think? @nepomnyi

I see, may be we can give what columns to load as the argument to the function? But that will change the API and break backward compatibility. On the other hand it will give the user more flexibility: if he wants to use less memory, he can can load only, x, y, u, andvcolumns. If he needs masks, like I do, he can loadmasks` as well. I think the user should be able to specify the columns by their names, not by their numbers in this case. Or we could make the function load original columns as default and offer adding more columns as an optional argument. That will not break backward compatibility. I think if we go with your suggestion to create a new function and keep the old function, then we have two functions doing the same thing, which is not neat from the organizational perspective. But that will not break backward compatibility for sure.

alexlib commented 3 weeks ago

The txt files as far as I remember do not have column names. So the position is just by compatibility of the programmers. We could change the output format for OpenPIV and then by the same move to update the loading in PIVPy. Again, this is a big change that breaks things backwards. I'd rather look for a compromise that will improve things forward without destroying the code that works for some people.

nepomnyi commented 3 weeks ago

Hi @nepomnyi - this was designed in purpose so the xarray is not growing in memory as we add more and more scalars. This is similar to the behavior of PIVMAT in this perspective. I'd like to hear about your idea how to make it better. My suggestion is to create a copy of the array with different [w] if you need your data stored. There is no problem to redesign the PIVPy for better usability, but we need to be careful not to overflow the memory.

I see. Yes, that's what I ended up doing: creating a copy of my dataset for tke, reynolds_stress and vorticity. So instead of one dataset with 3 data arrays (tke, reynolds_stress and vorticity), I ended up having 3 datasets which occupied more memory (because each of them contains, at least, u and v in addition). Also, that made the code clunkier. So the bottom line is that I am not sure if there is gain in such approach. Also, as far as I remember Xarray can handle bigger-than-memory cases. But I can live with it if you want to keep it this way.

nepomnyi commented 3 weeks ago

The txt files as far as I remember do not have column names. So the position is just by compatibility of the programmers. We could change the output format for OpenPIV and then by the same move to update the loading in PIVPy. Again, this is a big change that breaks things backwards. I'd rather look for a compromise that will improve things forward without destroying the code that works for some people.

No-no, I mean the user specifies the columns by name and, since we apriori know which names belong to which columns, we just add a little dictionary to the function io.load_openpiv_txt(). That's all I meant.

alexlib commented 3 weeks ago

Hi @nepomnyi - this was designed in purpose so the xarray is not growing in memory as we add more and more scalars. This is similar to the behavior of PIVMAT in this perspective. I'd like to hear about your idea how to make it better. My suggestion is to create a copy of the array with different [w] if you need your data stored. There is no problem to redesign the PIVPy for better usability, but we need to be careful not to overflow the memory.

I see. Yes, that's what I ended up doing: creating a copy of my dataset for tke, reynolds_stress and vorticity. So instead of one dataset with 3 data arrays (tke, reynolds_stress and vorticity), I ended up having 3 datasets which occupied more memory (because each of them contains, at least, u and v in addition). Also, that made the code clunkier. So the bottom line is that I am not sure if there is gain in such approach. Also, as far as I remember Xarray can handle bigger-than-memory cases. But I can live with it if you want to keep it this way.

please explain the user case and the possible solution. I am not fixed on the situation with the copy. I can see the possible needs. I'm not sure how much code this will affect - probably all the vec2scal and other plot functions that always assume only one scalar field. Please take a look at the PIVMAT and see if it's a big issue with recalculating the scalars and keeping them in a separate numpy array and copying them into w is a big issue or not. Xarray with the named attributes it's just my understanding a better way to work with the data by names and not by indices.

nepomnyi commented 3 weeks ago

Hi @nepomnyi - this was designed in purpose so the xarray is not growing in memory as we add more and more scalars. This is similar to the behavior of PIVMAT in this perspective. I'd like to hear about your idea how to make it better. My suggestion is to create a copy of the array with different [w] if you need your data stored. There is no problem to redesign the PIVPy for better usability, but we need to be careful not to overflow the memory.

I see. Yes, that's what I ended up doing: creating a copy of my dataset for tke, reynolds_stress and vorticity. So instead of one dataset with 3 data arrays (tke, reynolds_stress and vorticity), I ended up having 3 datasets which occupied more memory (because each of them contains, at least, u and v in addition). Also, that made the code clunkier. So the bottom line is that I am not sure if there is gain in such approach. Also, as far as I remember Xarray can handle bigger-than-memory cases. But I can live with it if you want to keep it this way.

please explain the user case and the possible solution. I am not fixed on the situation with the copy. I can see the possible needs. I'm not sure how much code this will affect - probably all the vec2scal and other plot functions that always assume only one scalar field. Please take a look at the PIVMAT and see if it's a big issue with recalculating the scalars and keeping them in a separate numpy array and copying them into w is a big issue or not. Xarray with the named attributes it's just my understanding a better way to work with the data by names and not by indices.

Ok, I will look into this. Also - in the pull request that I submitted (for $\Gamma_1$ and $\Gamma_2$ functions) - I did create two new corresponding data arrays within the dataset. I didn't do the w thing. So, I guess you want to postpone the pull request till we figure out what to do with it?

alexlib commented 3 weeks ago

The txt files as far as I remember do not have column names. So the position is just by compatibility of the programmers. We could change the output format for OpenPIV and then by the same move to update the loading in PIVPy. Again, this is a big change that breaks things backwards. I'd rather look for a compromise that will improve things forward without destroying the code that works for some people.

No-no, I mean the user specifies the columns by name and, since we apriori know which names belong to which columns, we just add a little dictionary to the function io.load_openpiv_txt(). That's all I meant.

i see. yes it is a good idea

nepomnyi commented 3 weeks ago

The txt files as far as I remember do not have column names. So the position is just by compatibility of the programmers. We could change the output format for OpenPIV and then by the same move to update the loading in PIVPy. Again, this is a big change that breaks things backwards. I'd rather look for a compromise that will improve things forward without destroying the code that works for some people.

No-no, I mean the user specifies the columns by name and, since we apriori know which names belong to which columns, we just add a little dictionary to the function io.load_openpiv_txt(). That's all I meant.

i see. yes it is a good idea

Then do we go with it? I.e. I will add an optional argument to io.load_openpiv_txt() offering the user to specify what columns to load. Since the argument is optional it is not breaking backward compatibility. And by default the function will be working as it is working now. Does this sound like a plan?

nepomnyi commented 3 weeks ago

Hi @nepomnyi - this was designed in purpose so the xarray is not growing in memory as we add more and more scalars. This is similar to the behavior of PIVMAT in this perspective. I'd like to hear about your idea how to make it better. My suggestion is to create a copy of the array with different [w] if you need your data stored. There is no problem to redesign the PIVPy for better usability, but we need to be careful not to overflow the memory.

I see. Yes, that's what I ended up doing: creating a copy of my dataset for tke, reynolds_stress and vorticity. So instead of one dataset with 3 data arrays (tke, reynolds_stress and vorticity), I ended up having 3 datasets which occupied more memory (because each of them contains, at least, u and v in addition). Also, that made the code clunkier. So the bottom line is that I am not sure if there is gain in such approach. Also, as far as I remember Xarray can handle bigger-than-memory cases. But I can live with it if you want to keep it this way.

please explain the user case and the possible solution. I am not fixed on the situation with the copy. I can see the possible needs. I'm not sure how much code this will affect - probably all the vec2scal and other plot functions that always assume only one scalar field. Please take a look at the PIVMAT and see if it's a big issue with recalculating the scalars and keeping them in a separate numpy array and copying them into w is a big issue or not. Xarray with the named attributes it's just my understanding a better way to work with the data by names and not by indices.

Hello, @alexlib.

1) Explanation of the user case and possible solution. I was doing moving average analysis to figure out how many PIV pairs I had to acquire. I was using several metrics for moving average. Among them, I had tke, Reynolds stress and vorticity. I acquired 6000 sample PIV frames. In a for loop, I would find my metrics for an average of i PIV frames, where i was ranging from 2 to 6000. At every iteration of the for loop I would create at least 3 separate Datasets: one where w was tke, one where w was Reynolds stress and one where w was vorticity. I think 3 separate Datasets occupy more memory than 1 dataset, because every separate dataset contains x, y, u, v in addition to w. Whereas if I had one dataset with tke, Reynolds stress and vorticity, I would store x, y, u, v only once. So, I think we have more memory overflow in this case than in the case where we keep creating DataArrays for every metric. And in the case where I have DataArrays for every metric I can delete the DataArray myself if I need more memory. So, the possible solution is to keep adding DataArrays and let the user decide how to manage them (he can delete them if he wants to).

2) How much code will be affected. 2a) Module graphics.py.

The overall conclusion is that not much code will be affected and backward compatibility will be preserved.

alexlib commented 2 weeks ago

lets open a branch for this and work on these extensions.

nepomnyi commented 2 weeks ago

lets open a branch for this and work on these extensions.

Sure, @alexlib. Do you want me to create it in my fork or do you want it to create yourself in the source PIVPY? If I create it myself in my fork, I am not sure what happens when I pull-request it to the source PIVPY. Will it just appear there? Personally, I am fine with either way.

alexlib commented 2 weeks ago

create on yours. before merging to master we will pull this branch and review.