SpatioTemporal / STAREPandas

STAREpandas adds SpatioTemporal Adaptive Resolution Encoding (STARE) support to pandas DataFrames. https://starepandas.readthedocs.io/en/latest/
MIT License
4 stars 1 forks source link

Hcp #126

Closed NiklasPhabian closed 1 year ago

NiklasPhabian commented 1 year ago

Was doing ```min(df.ts_start) instead of df.ts_start.min()````

Oh! That makes a lot of sense. Probably forces the interpreter to find the min value. How much speedup do you get? I will take care of the conflicts and merge this tomorrow

NiklasPhabian commented 1 year ago
michaelleerilee commented 1 year ago

@NiklasPhabian What's going on with this merge test?

NiklasPhabian commented 1 year ago

starepandas.io.pod.read_pods() doctest seems to fail.

michaelleerilee commented 1 year ago

@NiklasPhabian , there's an error in some SQL based code.

FAILED examples/catalog.ipynb:: - AttributeError: 'OptionEngine' object has no attribute 'execute'
FAILED examples/to_database.ipynb:: - AttributeError: 'OptionEngine' object has no attribute 'execute'

I don't know anything about this code...

michaelleerilee commented 1 year ago

What's going on with this pull request?

NiklasPhabian commented 1 year ago

So the tests pass; I think all the LFS issues should be resolved. The issue was related to changes in sqlalchemy>=2.0, which pandas has not addressed yet. I pegged sqlalchemy, but I am sure that pandas will solve this very soon.

@michaelleerilee: I'd really like to avoid stashing codeblocks in comments; especially when it goes into main/master (https://github.com/SpatioTemporal/STAREPandas/blob/hcp/starepandas/staredataframe.py#L1186)

@michaelleerilee do you think you can hack a pod example notebook together that we could add to the examples/ folder to have at least some tests going or should we close the PR right away?

NiklasPhabian commented 1 year ago

Here the example notebook we can use for tests https://github.com/SpatioTemporal/STARE-Applications/blob/main/09-STARE-PODS-IO-1.ipynb

mbauer288 commented 1 year ago

Question concerning podding w/ STAREPandas. I finally got a generalized version of the VIIRS podding process (09-STARE-PODS-Sketch-2.ipynb) working, with the intent to us it on the IMERG PFeatures data.

As a check I got the test-podding data from FlexFS and ran the code on it. Which worked exactly as expected until it got to the pod and save call:

    pods_written = sdf.write_pods(pod_root, level, chunk_name, temporal_chunking=temporal_chunking)

which throws and error as the main branch of STAREPandas doesn't know about temporal_chunking. I see something similar in the HPC branch, but I can't switch to that via git. I guess because the pull-request wasn't approved?

If I simply remove that argument, I still get an error from pandas.groupby:

ValueError: Grouper for '<class 'starepandas.staredataframe.STAREDataFrame'>' not 1-dimensional

Is this because the STARE dataframe is somehow incorrect?

    sdf <class 'starepandas.staredataframe.STAREDataFrame'>

    Data columns (total 9 columns):
     #   Column             Dtype
    ---  ------             -----
     0   lat                float32
     1   lon                float32
     2   sids               Int64
     3   ts_start           datetime64[ns]
     4   ts_end             datetime64[ns]
     5   I04_observations   float32
     6   I04_quality_flags  UInt16
     7   I05_observations   float32
     8   I05_quality_flags  UInt16

            lat       lon                 sids  ts_start             ts_end                 I04_observations    I04_quality_flags    I05_observations    I05_quality_flags
    --  -------  --------  -------------------  -------------------  -------------------  ------------------  -------------------  ------------------  -------------------
     0  4.46929  -119.84   3382414711004431630  2021-12-31 22:18:00  2021-12-31 22:24:00               65533                  256               65533                  256
     1  4.46831  -119.848  3382696059138709358  2021-12-31 22:18:00  2021-12-31 22:24:00               65533                  256               65533                  256
     2  4.46733  -119.855  3382696072051950574  2021-12-31 22:18:00  2021-12-31 22:24:00               65533                  256               65533                  256
     3  4.46635  -119.862  3382696052710401102  2021-12-31 22:18:00  2021-12-31 22:24:00               65533                  256               65533                  256
    ...
    [41369600 rows x 9 columns]
    memory usage: 1.8 GB

I'll go back to working on the Snow data stuff in the meantime, which is coming along fine.

Mike