Closed sirbastiano closed 4 months ago
Dear @sirbastiano thanks a lot for the suggestion. I didn't know polars
but it looks very interesting. I need to read carefully the documentation and the kink you shared.
Regarding the use of pandas
in this project, it is very limited, only the dump_records
function uses it, and, actually, it could even be a totally optional dependency.
I also use it in the example Jupyter notebook, if I remember correctly, but mostly to have a nice representation of tabular data.
Considering that pandas
is more widely used than polars
I would tend to have an optional dependency form both of them to allow the users to choose whatever they prefer.
Would you consider to submit a small PR going in this direction?
The main change would be to add third output_format
option in dump_records
.
Well, it is a matter of choice, potentially you can store the echo data row by row in the dataframe and make computations on that (Imagine range compression row by row).
Polars is much more better doing that, it employs parallelization and hardware efficiency.
We can keep both potentially, and keeping them as optional dependencies.
Ah sorry, I thought that you was talking about headers data. For echo data the long term idea is to use a format that goes in the direction of the one being developed in the CGS re-engineering and based on zarr. But I need to think a little bit more about it. I'm open to discussions of course.
My idea is to exploit parallelism of GPU and threads.
By the way, are you in ESRIN??
Let's take a coffee together!
Il giorno mar 2 lug 2024 alle ore 14:30 Antonio Valentino < @.***> ha scritto:
Ah sorry, I thought that you was talking about headers data. For echo data the long term idea is to use a format that goes in the direction of the one being developed in the CGS re-engineering and based on zarr. But I need to think a little bit more about it. I'm open to discussions of course.
— Reply to this email directly, view it on GitHub https://github.com/avalentino/s1isp/issues/3#issuecomment-2203039505, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARFBHLSF432VYUDH7ONGWJLZKKMPRAVCNFSM6AAAAABKG66BDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBTGAZTSNJQGU . You are receiving this because you were mentioned.Message ID: @.***>
Can this be closed now?
Yes
Il giorno ven 5 lug 2024 alle 00:49 Antonio Valentino < @.***> ha scritto:
Can this be closed now?
— Reply to this email directly, view it on GitHub https://github.com/avalentino/s1isp/issues/3#issuecomment-2209623150, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARFBHLQZ5JI4GK3TPATLQNTZKXGOLAVCNFSM6AAAAABKG66BDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGYZDGMJVGA . You are receiving this because you were mentioned.Message ID: @.***>
Dear Antonio, as efficiency is our top priority, I suggest switching to polars: https://pola.rs/
Let me know your thoughts on this.
Article: https://www.datacamp.com/tutorial/high-performance-data-manipulation-in-python-pandas2-vs-polars?dc_referrer=https%3A%2F%2Fduckduckgo.com%2F