SpatioTemporal / STAREPandas

STAREpandas adds SpatioTemporal Adaptive Resolution Encoding (STARE) support to pandas DataFrames. https://starepandas.readthedocs.io/en/latest/
MIT License
4 stars 1 forks source link

Pod Reading #157

Open mbauer288 opened 9 months ago

mbauer288 commented 9 months ago

So I'm trying to read a podded dataframe.

fetch_n_pod: Starting STARE Podding
    Reading '/Users/mbauer/tmp/data/POMD/discover/202001/DYAMONDv2_PE3600x1800-DE.tavg_30mn.prectot.20200115_0000z.nc4' and '/Users/mbauer/tmp/data/POMD/discover/DYAMONDv2_stare.nc' at level 4 from /Users/mbauer/tmp/data/pods

    The call:
        pod_root   = '/Users/mbauer/tmp/data/pods'
        sids_hex   = ['0x0000000000000004', '0x0008000000000004',  ...  '0x3ff0000000000004', '0x3ff8000000000004']
        tcover_tid = 2274394778256809073

        podded_sdf = starepandas.read_pods(pod_root=pod_root, sids=sids_hex, tids=[tcover_tid])

            STAREPandas.read_pods() -> starepandas/io/pod.py:read_pods():    
                path_format                = '{pod_root}{delim1}{sid}'
                pattern                    = '*'
                temporal_pattern           = '{pod_path}(.*)-.*'
                temporal_pattern_tid_index = 0
                tids_cmp                   = array([2274394778256809073])

                for sid in sids:
                    pod_path   = '/Users/mbauer/tmp/data/pods/0x0000000000000004'
                    pickles    = ['/Users/mbauer/tmp/data/pods/0x0000000000000004/0x1f90200025001c71-DYAMONDv2_PE3600x1800-DE.tavg_30mn.prectot.20200115_0000z.pkl.bz2', 
                                  ...
                                  '/Users/mbauer/tmp/data/pods/0x0000000000000004/0x1f906377a5001c71-DYAMONDv2_PE3600x1800-DE.tavg_30mn.prectot.20200213_2330z.pkl.bz2']
                    search     = '.*.*'
                    pods       = ['/Users/mbauer/tmp/data/pods/0x0000000000000004/0x1f90200025001c71-DYAMONDv2_PE3600x1800-DE.tavg_30mn.prectot.20200115_0000z.pkl.bz2', 
                                  ...
                                  '/Users/mbauer/tmp/data/pods/0x0000000000000004/0x1f906377a5001c71-DYAMONDv2_PE3600x1800-DE.tavg_30mn.prectot.20200213_2330z.pkl.bz2']
                    regexp     = '/Users/mbauer/tmp/data/pods/0x0000000000000004(.*)-.*'
                    m.groups() = ('/0x1f90200025001c71-DYAMONDv2_PE3600x1800',)

                    So 'pickles' correctly lists all the pod files by sid and tid; '/Users/mbauer/tmp/data/pods/0x0000000000000004/0x1f90200025001c71*'

                    But 'pods = list(filter(re.compile(search).match, pickles))' throws an RE.error over search = '.**.*', which I guess is the repeated '**'

                          File "/Users/mbauer/SpatioTemporal/STAREPandas/starepandas/io/pod.py", line 195, in read_pods
                            pods = list(filter(re.compile(search).match, pickles))
                                               ^^^^^^^^^^^^^^^^^^
                          File "/Users/mbauer/miniconda3/envs/stare/lib/python3.11/re/__init__.py", line 227, in compile
                            return _compile(pattern, flags)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^
                          File "/Users/mbauer/miniconda3/envs/stare/lib/python3.11/re/__init__.py", line 294, in _compile
                            p = _compiler.compile(pattern, flags)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                          File "/Users/mbauer/miniconda3/envs/stare/lib/python3.11/re/_compiler.py", line 743, in compile
                            p = _parser.parse(p, flags)
                                ^^^^^^^^^^^^^^^^^^^^^^^
                          File "/Users/mbauer/miniconda3/envs/stare/lib/python3.11/re/_parser.py", line 982, in parse
                            p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                          File "/Users/mbauer/miniconda3/envs/stare/lib/python3.11/re/_parser.py", line 457, in _parse_sub
                            itemsappend(_parse(source, state, verbose, nested + 1,
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                          File "/Users/mbauer/miniconda3/envs/stare/lib/python3.11/re/_parser.py", line 687, in _parse
                            raise source.error("multiple repeat",
                        re.error: multiple repeat at position 2

                    Changing the line to 
                        search = '.*{pattern}.*'.format(pattern=pattern) if pattern != "*" else '.*.*'
                    goes further but then I get another error.

                        Traceback (most recent call last):
                          File "/Users/mbauer/SpatioTemporal/STAREPandas/starepandas/io/pod.py", line 220, in read_pods
                            tid_ = int(m.groups()[temporal_pattern_tid_index],16)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                        ValueError: invalid literal for int() with base 16: '/0x1f90200025001c71-DYAMONDv2_PE3600x1800'
NiklasPhabian commented 9 months ago

what starepandas version are you using? The latest tagged version is v0.6.6 and does not contain the temporal part. I.e. you read would read with

starepandas.read_pods(pod_root=pod_root, sids=sids_hex, patter='DYAMONDv2_PE3600x1800')
NiklasPhabian commented 9 months ago

I am going to take a stab at fixing the read_pods() on master. In its current version, it should have never been allowed to be merged into master.

NiklasPhabian commented 9 months ago

if you installed from pypi, you are likely using v0.6.6 and the read_pods signature looks as follows: https://github.com/SpatioTemporal/STAREPandas/blob/v0.6.6/starepandas/io/pod.py

NiklasPhabian commented 9 months ago

I want to connect this to https://github.com/SpatioTemporal/STAREPandas/issues/42

NiklasPhabian commented 9 months ago

And to https://github.com/SpatioTemporal/STAREPandas/issues/147. They should all be addressed and closed together.