SAP-samples / hana-ml-samples

This project provides code examples for SAP HANA Predictive and Machine Learning scenarios and is educational content. It covers simple Predictive Analysis Library SQL examples as well as complete SAP HANA design-time “ML scenario”-application content or HANA-ML Python Notebook examples.
Apache License 2.0
90 stars 58 forks source link

PAL error[73001255]:Invalid model. Model format does not match specification.. :nodes/n\n' #31

Closed mak-rayate closed 1 year ago

mak-rayate commented 1 year ago

Hello , I'm unable to understand below error for hana_ml version --> 2.14.22120100

Error : ERROR:hana_ml.algorithms.pal.unified_classification:(423, 'AFL error: "HC_APL"."(DO statement)": line 60 col 1 (at pos 1957): search table error: _SYS_AFL.AFLPAL:UNIFIED_CLASSIFICATION_ANY: [423] (range 3) AFL error exception: exception 73001255: PAL error[73001255]:Invalid model. Model format does not match specification.. :nodes/n\n')

I'm using PAL algorithm as below :

from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
rdt_params = dict(random_state=2,n_estimators=10,max_depth=25,learning_rate=0.1)
uc_rdt = UnifiedClassification(func = 'HybridGradientBoostingTree', thread_ratio=1.0,**rdt_params)
uc_rdt.fit(data=res,
              key= 'ID', 
              label='Target',
              features=features,
              partition_method='stratified',
              stratified_column='Target', 
              partition_random_state=2,
              training_percent=0.8, ntiles=2)

I have cross checked already that features are available in dataset as well as named it properly . It works when I ran it yesterday. but today for same dataset and same code --> it's giving me above error.

Can anyone please help me to understand the root cause of this error.

raymondyao commented 1 year ago

Hi Mak, could you check the table structure? You can use res.get_table_structure() for training dataset and you can also use it for the predicting dataset. The training dataset should contain [ID] + [FEATURES] + [LABEL] while the predicting dataset only contains [ID] + [FEATURES]. The type of FEATURES should be the same.

mak-rayate commented 1 year ago

Hello Raymondyao,

Thank you for your response. Yes , the training dataset has ID + features + label. I'm getting this error while training the model. I have checked the table structure which is as below : {'Feature_1': 'NVARCHAR(9)', 'Feature_2': 'NVARCHAR(40)', 'Feature_3': 'NVARCHAR(20)', 'Feature_4': 'NVARCHAR(2)', 'Feature_5': 'NVARCHAR(10)', 'Feature_6': 'NVARCHAR(5)', 'ID': 'NVARCHAR(26)', 'Target': 'NVARCHAR(120)'}. For me it doesn't seem any problem with that. I'm really stuck at this point

raymondyao commented 1 year ago

Hi mak, I can successfully run your script with my faked data for the training. Would you mind if you can share a sample of your dataset so I can do further investigation.

mak-rayate commented 1 year ago

Hello , Sorry due to security purpose I can not share any sample data . I have run my code in J-notebook . it works fine but for same code when I run in SAP DI . I'm getting an error. For now I'm closing this issue because it's related with SAP DI tool I guess .

mak-rayate commented 1 year ago

Hello , I have performed some testing on data .

Data details : sample data count = 5382 stratify 0.5 data count = 2691

sample Data distribution : <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Target | Count -- | -- class 1 | 208 class 2 | 219 class 3 | 283 class 4 | 293 class 5 | 323 class 6 | 351 class 7 | 383 class 8 | 410 class 9 | 421 class 10 | 457 class 11 | 460 class 12 | 462 class 13 | 527 class 14 | 585

Strat Data distribution : <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Target | COUNT -- | -- class 1 | 104 class 2 | 109 class 3 | 141 class 4 | 147 class 5 | 162 class 6 | 175 class 7 | 191 class 8 | 205 class 9 | 211 class 10 | 229 class 11 | 230 class 12 | 231 class 13 | 263 class 14 | 293

I have observed that , When I removes the hyper-parameters and keep it as defaults ,the code is not giving me any above error. But with mentioned hyperparameter , I am getting an error. It is really strange for me . because I assumed that if hyper-parameters are not proper then it will affect only accuracy of them model . but in this case , the code stops working and throwing PAL ERROR .

It will be really helpful your opinions or thoughts if possible to share .

raymondyao commented 1 year ago

Talked with our developer, the issue has been already fixed in the latest HANA version. You can either upgrade your HANA instance or use 'histogram'.

mak-rayate commented 1 year ago

Hello raymondyao, Thank you for your response ! Yes noted for latest HANA version. I found that the issue is in the data. When I run code without key , it works properly.

image

but when I put key attribute , it gives me an error . There must be duplicate key available in data, I'm checking with it HANA data team .

image

My question is , 1) Do we really need key attribute while training the model . and what if we train model without key attribute ? 2) and for prediction I guess we need key column , am I right ?

raymondyao commented 1 year ago

If your training dataset has key, you need to specify it. If your dataset has no key, the fit function supports the training without key. But for prediction, it always needs a key.