UMEP-dev / SuPy

SUEWS that speaks Python
https://supy.readthedocs.io/
GNU General Public License v3.0
13 stars 7 forks source link

Number limitation to SUEWS_txt-file codes #48

Closed gusbacos closed 1 year ago

gusbacos commented 2 years ago

Describe the Issue When running sp.init_supy() it seems to not recognize codes longer than 10 digits.

so for a code like

2019353657265

it only look for 2019353657

see code:

df_state_init = sp.init_supy(path_runcontrol)

2022-04-20 15:37:24,825 - SuPy - INFO - All cache cleared. 2022-04-20 15:37:24,898 - SuPy - ERROR - Entries missing from SUEWS_NonVeg.txt Traceback (most recent call last): File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\supy_load.py", line 835, in build_code_df df_code = df_code0.loc[list_code, list_keys] File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 925, in getitem return self._getitem_tuple(key) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1100, in _getitem_tuple return self._getitem_lowerdim(tup) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 822, in _getitem_lowerdim return self._getitem_nested_tuple(tup) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 906, in _getitem_nested_tuple obj = getattr(obj, self.name)._getitem_axis(key, axis=axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1153, in _getitem_axis return self._getitem_iterable(key, axis=axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1093, in _getitem_iterable keyarr, indexer = self._get_listlike_indexer(key, axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1314, in _get_listlike_indexer self._validate_read_indexer(keyarr, indexer, axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1374, in _validate_read_indexer raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Int64Index([-2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648],\n dtype='int64', name='Code')] are in the [index]" 2022-04-20 15:37:24,900 - SuPy - ERROR - missing code: [-2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648] Traceback (most recent call last): File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\supy_load.py", line 835, in build_code_df df_code = df_code0.loc[list_code, list_keys] File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 925, in getitem return self._getitem_tuple(key) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1100, in _getitem_tuple return self._getitem_lowerdim(tup) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 822, in _getitem_lowerdim return self._getitem_nested_tuple(tup) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 906, in _getitem_nested_tuple obj = getattr(obj, self.name)._getitem_axis(key, axis=axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1153, in _getitem_axis return self._getitem_iterable(key, axis=axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1093, in _getitem_iterable keyarr, indexer = self._get_listlike_indexer(key, axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1314, in _get_listlike_indexer self._validate_read_indexer(keyarr, indexer, axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1374, in _validate_read_indexer raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Int64Index([-2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648],\n dtype='int64', name='Code')] are in the [index]" 2022-04-20 15:37:24,901 - SuPy - ERROR - df_code0 has been dumped into C:\temp\suewstests\df_code0.pkl for debugging! Traceback (most recent call last): File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\supy_load.py", line 835, in build_code_df df_code = df_code0.loc[list_code, list_keys] File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 925, in getitem return self._getitem_tuple(key) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1100, in _getitem_tuple return self._getitem_lowerdim(tup) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 822, in _getitem_lowerdim return self._getitem_nested_tuple(tup) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 906, in _getitem_nested_tuple obj = getattr(obj, self.name)._getitem_axis(key, axis=axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1153, in _getitem_axis return self._getitem_iterable(key, axis=axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1093, in _getitem_iterable keyarr, indexer = self._get_listlike_indexer(key, axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1314, in _get_listlike_indexer self._validate_read_indexer(keyarr, indexer, axis) File "C:\Users\xbacos\Anaconda3\envs\supy_env\lib\site-packages\pandas\core\indexing.py", line 1374, in _validate_read_indexer raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Int64Index([-2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648,\n -2147483648, -2147483648, -2147483648, -2147483648, -2147483648],\n dtype='int64', name='Code')] are in the [index]"

If i reduce the amount of digits to below 10 it works. Perhaps it is a problem in the sp.init_supy() only?

Best, Oskar

sunt05 commented 2 years ago

Thanks Oskar for noting this.

Then I think this would be common issue in both SuPy and SUEWS as int is used to represent such codes.

Can you please remind us here your designed coding convention for these keys?

Then we may come out with a solution from either the encoding side or program/code itself.

gusbacos commented 2 years ago

Great.

I think that a more well thought out convention than today should be used.

So at the moment it convert the String identifier such as 'Alb', 'ESTM' or 'LAI' to digits, and then concatenate these into a new unique code.

similar to this.

timestamp = int(datetime.utcnow().strftime('%y%j%M%S%f')) # Year%DOY#Minute#millisecond
code_str_to_int = int('ESTM', 36) 

print('code_str_to_int:', code_str_to_int)
print('timestamp:', timestamp)

code = int(str(code_str_to_int) + str(timestamp))

print('code:', code)

_______________________________________________________________________
code_str_to_int: 690538
timestamp: 221120847667162
code: `690538221120847667162`

I was thinking that it would probably be easier if we assign a individual number to all parameters in a dictionary instead, like {'TrafficRate_WE' : 01, 'TrafficRate_WD' : 02 etc..} Perhaps then if we need to have 10digits long code go for something like:

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

CodeID | CodeID | Year | Year | DOY | DOY | DOY | Unique_ID | Unique_ID | Unique_ID -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 0 | 5 | 2 | 2 | 2 | 5 | 1 | 0 | 1 | 6

I used the millisecond-timestamp as unique timestamp as this ensure that we will get unique codes. But pehaps we can get it this way as well?

sunt05 commented 2 years ago

sorry I don't quite understand why a timestamp would be associated with the code in such a way – if we were to provide some identifier, hash or alike would be more appropriate.

suegrimmond commented 2 years ago

because different people will provide the same data - so they are unique

github-actions[bot] commented 1 year ago

Stale issue message