Open nepomnyi opened 3 weeks ago
Hi @nepomnyi - this was designed in purpose so the xarray is not growing in memory as we add more and more scalars. This is similar to the behavior of PIVMAT in this perspective. I'd like to hear about your idea how to make it better. My suggestion is to create a copy of the array with different [w] if you need your data stored. There is no problem to redesign the PIVPy for better usability, but we need to be careful not to overflow the memory.
You're absolutely right, it's a bug in load_txt that is related to the fact that masks appeared later in time. I'm not sure though what's better, maybe to keep an older version and create a newer version as an optional - load 5 or 6 columns. What do you think? @nepomnyi
You're absolutely right, it's a bug in load_txt that is related to the fact that masks appeared later in time. I'm not sure though what's better, maybe to keep an older version and create a newer version as an optional - load 5 or 6 columns. What do you think? @nepomnyi
I see, may be we can give what columns to load as the argument to the function? But that will change the API and break backward compatibility. On the other hand it will give the user more flexibility: if he wants to use less memory, he can can load only, x
, y
, u, and
vcolumns. If he needs masks, like I do, he can load
masks` as well. I think the user should be able to specify the columns by their names, not by their numbers in this case. Or we could make the function load original columns as default and offer adding more columns as an optional argument. That will not break backward compatibility. I think if we go with your suggestion to create a new function and keep the old function, then we have two functions doing the same thing, which is not neat from the organizational perspective. But that will not break backward compatibility for sure.
The txt files as far as I remember do not have column names. So the position is just by compatibility of the programmers. We could change the output format for OpenPIV and then by the same move to update the loading in PIVPy. Again, this is a big change that breaks things backwards. I'd rather look for a compromise that will improve things forward without destroying the code that works for some people.
Hi @nepomnyi - this was designed in purpose so the xarray is not growing in memory as we add more and more scalars. This is similar to the behavior of PIVMAT in this perspective. I'd like to hear about your idea how to make it better. My suggestion is to create a copy of the array with different [w] if you need your data stored. There is no problem to redesign the PIVPy for better usability, but we need to be careful not to overflow the memory.
I see. Yes, that's what I ended up doing: creating a copy of my dataset for tke
, reynolds_stress
and vorticity
. So instead of one dataset with 3 data arrays (tke
, reynolds_stress
and vorticity
), I ended up having 3 datasets which occupied more memory (because each of them contains, at least, u
and v
in addition). Also, that made the code clunkier. So the bottom line is that I am not sure if there is gain in such approach. Also, as far as I remember Xarray can handle bigger-than-memory cases. But I can live with it if you want to keep it this way.
The txt files as far as I remember do not have column names. So the position is just by compatibility of the programmers. We could change the output format for OpenPIV and then by the same move to update the loading in PIVPy. Again, this is a big change that breaks things backwards. I'd rather look for a compromise that will improve things forward without destroying the code that works for some people.
No-no, I mean the user specifies the columns by name and, since we apriori know which names belong to which columns, we just add a little dictionary to the function io.load_openpiv_txt()
. That's all I meant.
Hi @nepomnyi - this was designed in purpose so the xarray is not growing in memory as we add more and more scalars. This is similar to the behavior of PIVMAT in this perspective. I'd like to hear about your idea how to make it better. My suggestion is to create a copy of the array with different [w] if you need your data stored. There is no problem to redesign the PIVPy for better usability, but we need to be careful not to overflow the memory.
I see. Yes, that's what I ended up doing: creating a copy of my dataset for
tke
,reynolds_stress
andvorticity
. So instead of one dataset with 3 data arrays (tke
,reynolds_stress
andvorticity
), I ended up having 3 datasets which occupied more memory (because each of them contains, at least,u
andv
in addition). Also, that made the code clunkier. So the bottom line is that I am not sure if there is gain in such approach. Also, as far as I remember Xarray can handle bigger-than-memory cases. But I can live with it if you want to keep it this way.
please explain the user case and the possible solution. I am not fixed on the situation with the copy. I can see the possible needs. I'm not sure how much code this will affect - probably all the vec2scal and other plot functions that always assume only one scalar field. Please take a look at the PIVMAT and see if it's a big issue with recalculating the scalars and keeping them in a separate numpy array and copying them into w is a big issue or not. Xarray with the named attributes it's just my understanding a better way to work with the data by names and not by indices.
Hi @nepomnyi - this was designed in purpose so the xarray is not growing in memory as we add more and more scalars. This is similar to the behavior of PIVMAT in this perspective. I'd like to hear about your idea how to make it better. My suggestion is to create a copy of the array with different [w] if you need your data stored. There is no problem to redesign the PIVPy for better usability, but we need to be careful not to overflow the memory.
I see. Yes, that's what I ended up doing: creating a copy of my dataset for
tke
,reynolds_stress
andvorticity
. So instead of one dataset with 3 data arrays (tke
,reynolds_stress
andvorticity
), I ended up having 3 datasets which occupied more memory (because each of them contains, at least,u
andv
in addition). Also, that made the code clunkier. So the bottom line is that I am not sure if there is gain in such approach. Also, as far as I remember Xarray can handle bigger-than-memory cases. But I can live with it if you want to keep it this way.please explain the user case and the possible solution. I am not fixed on the situation with the copy. I can see the possible needs. I'm not sure how much code this will affect - probably all the vec2scal and other plot functions that always assume only one scalar field. Please take a look at the PIVMAT and see if it's a big issue with recalculating the scalars and keeping them in a separate numpy array and copying them into w is a big issue or not. Xarray with the named attributes it's just my understanding a better way to work with the data by names and not by indices.
Ok, I will look into this. Also - in the pull request that I submitted (for $\Gamma_1$ and $\Gamma_2$ functions) - I did create two new corresponding data arrays within the dataset. I didn't do the w
thing. So, I guess you want to postpone the pull request till we figure out what to do with it?
The txt files as far as I remember do not have column names. So the position is just by compatibility of the programmers. We could change the output format for OpenPIV and then by the same move to update the loading in PIVPy. Again, this is a big change that breaks things backwards. I'd rather look for a compromise that will improve things forward without destroying the code that works for some people.
No-no, I mean the user specifies the columns by name and, since we apriori know which names belong to which columns, we just add a little dictionary to the function
io.load_openpiv_txt()
. That's all I meant.
i see. yes it is a good idea
The txt files as far as I remember do not have column names. So the position is just by compatibility of the programmers. We could change the output format for OpenPIV and then by the same move to update the loading in PIVPy. Again, this is a big change that breaks things backwards. I'd rather look for a compromise that will improve things forward without destroying the code that works for some people.
No-no, I mean the user specifies the columns by name and, since we apriori know which names belong to which columns, we just add a little dictionary to the function
io.load_openpiv_txt()
. That's all I meant.i see. yes it is a good idea
Then do we go with it? I.e. I will add an optional argument to io.load_openpiv_txt()
offering the user to specify what columns to load. Since the argument is optional it is not breaking backward compatibility. And by default the function will be working as it is working now. Does this sound like a plan?
Hi @nepomnyi - this was designed in purpose so the xarray is not growing in memory as we add more and more scalars. This is similar to the behavior of PIVMAT in this perspective. I'd like to hear about your idea how to make it better. My suggestion is to create a copy of the array with different [w] if you need your data stored. There is no problem to redesign the PIVPy for better usability, but we need to be careful not to overflow the memory.
I see. Yes, that's what I ended up doing: creating a copy of my dataset for
tke
,reynolds_stress
andvorticity
. So instead of one dataset with 3 data arrays (tke
,reynolds_stress
andvorticity
), I ended up having 3 datasets which occupied more memory (because each of them contains, at least,u
andv
in addition). Also, that made the code clunkier. So the bottom line is that I am not sure if there is gain in such approach. Also, as far as I remember Xarray can handle bigger-than-memory cases. But I can live with it if you want to keep it this way.please explain the user case and the possible solution. I am not fixed on the situation with the copy. I can see the possible needs. I'm not sure how much code this will affect - probably all the vec2scal and other plot functions that always assume only one scalar field. Please take a look at the PIVMAT and see if it's a big issue with recalculating the scalars and keeping them in a separate numpy array and copying them into w is a big issue or not. Xarray with the named attributes it's just my understanding a better way to work with the data by names and not by indices.
Hello, @alexlib.
1) Explanation of the user case and possible solution. I was doing moving average analysis to figure out how many PIV pairs I had to acquire. I was using several metrics for moving average. Among them, I had tke
, Reynolds stress
and vorticity
. I acquired 6000 sample PIV frames. In a for loop, I would find my metrics for an average of i
PIV frames, where i
was ranging from 2 to 6000. At every iteration of the for loop I would create at least 3 separate Datasets: one where w
was tke
, one where w
was Reynolds stress
and one where w
was vorticity
. I think 3 separate Datasets occupy more memory than 1 dataset, because every separate dataset contains x
, y
, u
, v
in addition to w
. Whereas if I had one dataset with tke
, Reynolds stress
and vorticity
, I would store x
, y
, u
, v
only once. So, I think we have more memory overflow in this case than in the case where we keep creating DataArrays for every metric. And in the case where I have DataArrays for every metric I can delete the DataArray myself if I need more memory. So, the possible solution is to keep adding DataArrays and let the user decide how to manage them (he can delete them if he wants to).
2) How much code will be affected.
2a) Module graphics.py
.
contour_plot()
. To maintain backward compatibility, add an optional argument asking user to identify what DataArray he wants to plot the contours of.
2b) Module pivpy.py
w
to their respective names (i.e., create own DataArrays for every metric).__mul__(self, scalar)
method will have to be changed a bit: just multiply by the scalar all the DataArrays except for x
and y
- this approach will preserve backward compatibilityvec2scal(self, flow_property: str = "curl"):
doesn't have to be changed.The overall conclusion is that not much code will be affected and backward compatibility will be preserved.
lets open a branch for this and work on these extensions.
lets open a branch for this and work on these extensions.
Sure, @alexlib. Do you want me to create it in my fork or do you want it to create yourself in the source PIVPY? If I create it myself in my fork, I am not sure what happens when I pull-request it to the source PIVPY. Will it just appear there? Personally, I am fine with either way.
create on yours. before merging to master we will pull this branch and review.
Hello, Dr. Liberzon @alexlib. I noticed two things. 1) In the methods
vorticity()
,strain()
,divergence()
,acceleration()
,kinetic_energy()
,tke()
,reynolds_stress()
andrms()
, you create new respective DataArrays within the Dataset. But you give them all one name:w
. In the attributes, you call them differently, that's true. But the names of the DataArrays themselves are all the same. It was a bit of inconvenience to me, when I wanted to have them all in one Dataset. I am wondering if you did it on purpose. If not, do you mind if I correct it and submit a pull request? 2) In the functionio.load_openpiv_txt()
, on the line where we load the .txt filed = np.genfromtxt(filename, usecols=(0,1,2,3,4))
, you load only first 5 columns of the OpenPIV .txt file which arex
,y
,u
,v
,flags
. In my work, I also need the last columnmasks
which you don't load. I need it to mask out bubble velocities. So again, I am wondering if don't load themasks
column on purpose. If not, do you mind if I correct it and submit a pull request? Thanks. Ivan