barronh / pyrsig

Python interface to RSIG Web API
GNU General Public License v3.0
4 stars 2 forks source link

from_swath fails on windows #7

Open barronh opened 2 hours ago

barronh commented 2 hours ago

Hi Barron,

Thanks for sharing wonderful codes as usual. I started some test runs for your example codes on my end to see if we can adopt your code base for our work.

For GIS TropOMI Processing, I used plot_shapefile.py

However, I encountered an error with the “xdr” option. The code works ok with the “ascii” option. I did not study hard your pyrsig code yet but can do if needed. I suspect that the error is due to a possible change in TROPOMI meta data change since you posted your example code.

Here is the error log.

Traceback (most recent call last):
  File "c:\Users\byeon\OneDrive\pySMSS\GEMS\plot_shapefile.py", line 30, in <module>
    tropdf = api.to_dataframe(datakey, bdate=bdate, backend='xdr')
  File "C:\Users\byeon\AppData\Roaming\Python\Python310\site-packages\pyrsig\__init__.py", line 729, in to_dataframe
    df = xdr.from_xdrfile(outpath, na_values=[-9999., -999])
  File "C:\Users\byeon\AppData\Roaming\Python\Python310\site-packages\pyrsig\xdr.py", line 58, in from_xdrfile
    outf = from_xdr(
  File "C:\Users\byeon\AppData\Roaming\Python\Python310\site-packages\pyrsig\xdr.py", line 109, in from_xdr
    df = from_swath(inf)
  File "C:\Users\byeon\AppData\Roaming\Python\Python310\site-packages\pyrsig\xdr.py", line 322, in from_swath
    timestamps = pd.to_datetime(nts, format=infmt, utc=True).strftime(outfmt)
  File "C:\Users\byeon\miniforge3\lib\site-packages\pandas\core\tools\datetimes.py", line 1099, in to_datetime
    return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
  File "C:\Users\byeon\miniforge3\lib\site-packages\pandas\core\tools\datetimes.py", line 467, in _array_strptime_with_fallback
    result, tz_out = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
  File "strptime.pyx", line 501, in pandas._libs.tslibs.strptime.array_strptime
  File "strptime.pyx", line 451, in pandas._libs.tslibs.strptime.array_strptime
  File "strptime.pyx", line 583, in pandas._libs.tslibs.strptime._parse_with_format
ValueError: time data "4" doesn't match format "%Y%j%H%M", at position 0. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

Please let me know if you need more information such as my library version, etc. At this stage, I don’t know what information you may need or you can replicate the error on your end.

... cutting other discussion ...

-Byeong

barronh commented 2 hours ago

Thanks for sharing this issue. You and I went around a bit via email. I was unable to reproduce the problem on Linux or Mac systems even with the file you shared. However, on my Windows machine it to failed right away even without the file you shared.

Turns out it is an easy to fix. I’ll incorporate the update in a new release with some other updates.

In the meantime, you can directly edit your installation super easily.

Change the '>l' to '>i8' on line 322 of C:\Users\byeon\AppData\Roaming\Python\Python310\site-packages\pyrsig\xdr.py

The problem actually happens in several other readers, so you could also just find and replace all '>l' to '>i8'.

When I make that change, it works great on Mac, Linux, and Windows.

The root of the problem is related to interpretation of datatype and char representation. The xdr file is being read in binary mode and decoding byes as long-integers (64-bit aka 8-byte) using big-endian representation. On Linux and Mac, '>i' is interpreted as '>i4' (i.e, 4-byte) and '>l' is interpreted as '>i8' (i.e., 8-byte). On Windows, it weirdly interprets both '>i' and '>l' as '>i4'… and '>q' as '>i8'… who knows why. You can see though that Windows would get 2 values for every 1 and they would be garbage data. All systems interpret '>i8' the same (8-byte), so problem is solved by using the longer code.