method HROCH for Competition2022

janoPig commented 2 years ago

Competition Checklist:

[x] title of this PR is meaningful, i.e. "method X for comp"
[x] A folder has been added to submission/ with a meaningful name corresponding to your method name.
The added folder includes these elements:
- [x] metadata.yml (required): A file describing your submission, following the descriptions in example/metadata.yml.
- [x] regressor.py (required): a Python file that defines your method, named appropriately. See submission/feat-example/regressor.py for complete documentation. It contains:
  - [x] est: a sklearn-compatible Regressor object.
  - [x] model(est, X=None): a function that returns a sympy-compatible string specifying the final model. It can optionally take the training data as an input argument. See guidance below.
  - [x] eval_kwargs (optional): a dictionary that can specify method-specific arguments to evaluate_model.py.
- [x] environment.yml (optional): a conda environment file that specifies dependencies for your submission.
- [x] install.sh (optional): a bash script that installs your method.

I have verified that:

[x] install scripts do not require sudo permissions.
[x] if pulled remotely, the source code is a fixed version (i.e., rerunning install.sh shouldn't pulll a different version of the code when run multiple times.)

Refer to the competition guide if you are unsure about any steps. If you don't find an answer, ping us!

folivetti commented 2 years ago

hi, you might want to create a install.sh script that copies the binary to a local folder such as ~/.local/bin and then you can do something like

import os
os.environ["PATH"] = os.environ["PATH"] + os.path.expanduser("~/.local/bin")

at the start of your py script

janoPig commented 2 years ago

@folivetti Thank you for the help, but still not working

folivetti commented 2 years ago

@folivetti Thank you for the help, but still not working

my method also needs to copy a binary file to someplace inside the path. You might want to have a look at how I'm doing that https://github.com/folivetti/ITEA/tree/master/python

hope it helps you!

janoPig commented 2 years ago

@folivetti Thank you for the help, but still not working

my method also needs to copy a binary file to someplace inside the path. You might want to have a look at how I'm doing that https://github.com/folivetti/ITEA/tree/master/python

hope it helps you! @folivetti Thanks for hehp, this work, i must only resolve glibc dependecies, and all is ok.

lacava commented 2 years ago

FYI some changes to CI have been pushed to the competition branch; please sync with those upstream changes to assure the CI identifies HROCH properly.

lacava commented 2 years ago

sorry, going to have to ask you to rebase one more time (fixed workflow trigger in https://github.com/cavalab/srbench/commit/b9715a8eaa9fc40397d21bea6c16d05f8b6e8949

janoPig commented 2 years ago

sorry, going to have to ask you to rebase one more time (fixed workflow trigger in b9715a8

It's okay now. thank you. I still have to build my method on ubuntu 20.04, not 22.04.

janoPig commented 2 years ago

@lacava OK, ready to merge for me. Only unclear thing, i got X = array([[ 4. , 5. , 9.5], [ 0. , 1. , 5. ], [32. , 2. , 8. ], to fit 192_vineyard.tsv in CI test. First column is index. Its no problem for SR, but its redundant.

lacava commented 2 years ago

thanks for your submission! merging now.

Only unclear thing, i got X = array([[ 4. , 5. , 9.5], [ 0. , 1. , 5. ], [32. , 2. , 8. ], to fit 192_vineyard.tsv in CI test. First column is index. Its no problem for SR, but its redundant.

Can you clarify?

janoPig commented 2 years ago

evaluate_model function call est.fit(X_train_scaled, y_train_scaled) at line 122

X_train_scaled is a numpy array and i see in CI log evaluate_model.py(line106/107): X_train: (15, 3) y_train: (15,) training Hroch()

and X = array([[ 4. , 5. , 9.5], [ 0. , 1. , 5. ],...

but 192_vineyard dataset have only 2 features lugs_1989 and lugs_1990.

I am not sure that it this ok.

lacava commented 2 years ago

hi @janoPig your method has been merged. However I wanted to warn you that it is taking ~ 1 hour to complete the tests, for example: https://github.com/cavalab/srbench/runs/6261855661?check_suite_focus=true#step:9:1

The tests should take less than a few minutes to complete. Otherwise you run the risk that the method won't finish in time on the competition datasets.

Just wanted to make sure you are aware of that.

lacava commented 2 years ago

evaluate_model function call est.fit(X_train_scaled, y_train_scaled) at line 122

X_train_scaled is a numpy array and i see in CI log evaluate_model.py(line106/107): X_train: (15, 3) y_train: (15,) training Hroch()

and X = array([[ 4. , 5. , 9.5], [ 0. , 1. , 5. ],...

but 192_vineyard dataset have only 2 features lugs_1989 and lugs_1990.

I am not sure that it this ok.

Got it. The index was being included as a feature; fixed in https://github.com/cavalab/srbench/commit/8176c31196a950392c758e06af10e81d4db7eaf2

That shouldn't affect the tests or the competition, but it's good to know.

janoPig commented 2 years ago

hi @janoPig your method has been merged. However I wanted to warn you that it is taking ~ 1 hour to complete the tests, for example: https://github.com/cavalab/srbench/runs/6261855661?check_suite_focus=true#step:9:1

The tests should take less than a few minutes to complete. Otherwise you run the risk that the method won't finish in time on the competition datasets.

Just wanted to make sure you are aware of that.

hi @lacava . thank you for the notice. I follow competition guidelines

The time limits are as follows: For datasets up to 1000 rows, 60 minutes (1 hour) For datasets up to 10000 rows, 600 minutes (10 hours)

Hroch don't have internal stopping criteria, it simply run given time, or stop if r2 error is smaller that 1e-12. In 192_vineyard dataset they are less that 1000 rows, so it run 1 hour.

cavalab / srbench

method HROCH for Competition2022 #104