PAL error[73001255]:Invalid model. Model format does not match specification.. :nodes/n\n'

mak-rayate commented 1 year ago

Hello , I'm unable to understand below error for hana_ml version --> 2.14.22120100

Error : ERROR:hana_ml.algorithms.pal.unified_classification:(423, 'AFL error: "HC_APL"."(DO statement)": line 60 col 1 (at pos 1957): search table error: _SYS_AFL.AFLPAL:UNIFIED_CLASSIFICATION_ANY: [423] (range 3) AFL error exception: exception 73001255: PAL error[73001255]:Invalid model. Model format does not match specification.. :nodes/n\n')

I'm using PAL algorithm as below :

from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
rdt_params = dict(random_state=2,n_estimators=10,max_depth=25,learning_rate=0.1)
uc_rdt = UnifiedClassification(func = 'HybridGradientBoostingTree', thread_ratio=1.0,**rdt_params)
uc_rdt.fit(data=res,
              key= 'ID', 
              label='Target',
              features=features,
              partition_method='stratified',
              stratified_column='Target', 
              partition_random_state=2,
              training_percent=0.8, ntiles=2)

I have cross checked already that features are available in dataset as well as named it properly . It works when I ran it yesterday. but today for same dataset and same code --> it's giving me above error.

Can anyone please help me to understand the root cause of this error.

raymondyao commented 1 year ago

Hi Mak, could you check the table structure? You can use res.get_table_structure() for training dataset and you can also use it for the predicting dataset. The training dataset should contain [ID] + [FEATURES] + [LABEL] while the predicting dataset only contains [ID] + [FEATURES]. The type of FEATURES should be the same.

mak-rayate commented 1 year ago

Hello Raymondyao,

Thank you for your response. Yes , the training dataset has ID + features + label. I'm getting this error while training the model. I have checked the table structure which is as below : {'Feature_1': 'NVARCHAR(9)', 'Feature_2': 'NVARCHAR(40)', 'Feature_3': 'NVARCHAR(20)', 'Feature_4': 'NVARCHAR(2)', 'Feature_5': 'NVARCHAR(10)', 'Feature_6': 'NVARCHAR(5)', 'ID': 'NVARCHAR(26)', 'Target': 'NVARCHAR(120)'}. For me it doesn't seem any problem with that. I'm really stuck at this point

raymondyao commented 1 year ago

Hi mak, I can successfully run your script with my faked data for the training. Would you mind if you can share a sample of your dataset so I can do further investigation.

mak-rayate commented 1 year ago

Hello , Sorry due to security purpose I can not share any sample data . I have run my code in J-notebook . it works fine but for same code when I run in SAP DI . I'm getting an error. For now I'm closing this issue because it's related with SAP DI tool I guess .

mak-rayate commented 1 year ago

Hello , I have performed some testing on data .

Data details : sample data count = 5382 stratify 0.5 data count = 2691

sample Data distribution : <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Strat Data distribution : <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

I have observed that , When I removes the hyper-parameters and keep it as defaults ,the code is not giving me any above error. But with mentioned hyperparameter , I am getting an error. It is really strange for me . because I assumed that if hyper-parameters are not proper then it will affect only accuracy of them model . but in this case , the code stops working and throwing PAL ERROR .

It will be really helpful your opinions or thoughts if possible to share .

raymondyao commented 1 year ago

Talked with our developer, the issue has been already fixed in the latest HANA version. You can either upgrade your HANA instance or use 'histogram'.

mak-rayate commented 1 year ago

Hello raymondyao, Thank you for your response ! Yes noted for latest HANA version. I found that the issue is in the data. When I run code without key , it works properly.

but when I put key attribute , it gives me an error . There must be duplicate key available in data, I'm checking with it HANA data team .

My question is , 1) Do we really need key attribute while training the model . and what if we train model without key attribute ? 2) and for prediction I guess we need key column , am I right ?

raymondyao commented 1 year ago

If your training dataset has key, you need to specify it. If your dataset has no key, the fit function supports the training without key. But for prediction, it always needs a key.

SAP-samples / hana-ml-samples

PAL error[73001255]:Invalid model. Model format does not match specification.. :nodes/n\n' #31