Closed DE0CH closed 10 months ago
Hi @DE0CH , it is great that you decided to submit a PR. Could you give me a bit more context? Why would a user want to define the parameters in this way? It seems much more typing and much more room for errors.
I can see the uses of a programmatic interface that added parameters one by one, something like:
params <- ParameterSet$new() params$newInt(name="a", domain=c(1,10), transf="log") params$newReal(name="b", domain=c(0.0,1.0), transf="log")
Actually, the above is very similar to what paradox provides (see the examples here: https://paradox.mlr-org.com/). Having an interface similar to paradox would allow us to use their code (and it will interfacing irace with mlr3).
I'm not saying that I'm against what you propose but I'm trying to understand what is the use-case.
Cheers!
Hi Manuel, thanks for the reply. I thought of adding this when I saw the the way the parameters are specified in example_dual_annealing.py in iracepy.
I agree that it is much easier and error-free for humans to just write the string than using this interface, but I don't intend this interface to be used by end-users directly. It's mostly used by other wrappers that have much better interface and error checking. For example, adding compatiblity with the paradox
package and compatiblility for ConfigSpace configurations (which I am currently working on for iracepy
). These libraries will have to pass data into irace somehow and I think the best way is through this newly added readParametersData
. It is much easier and secure for library authors to use because they don't have to worry about formatting a string and escaping special characters. As such, the function assumes the input is correct as checked by user-facing interfaces and will result in undefined behaviour if the input is invalid.
As I see it, the readParameters
function accomplishes two things. First, it parses the string into native data structure for easier manipulation and secondly, it resolves the dependencies and compute properties for irace to use later such as the nbFixed
, depends
and hierarchy
fields in the parameters
returned. The readParameterData
functoin would only do the second part and let whatever user facing library do the first part.
One use case I am considering is to load SMAC3 configurations from the surrogate modles in aclib2 for meta-tuning irace. If I don't want to manually translate tens of configurations files manually by hand, I need some way to translate the configuration files automaically. I am considering to load in the parameters file with ConfigSpace
and then translate it into the format that readParametersData
accepts and then get a parameters
object that irace
can use natively.
In summary, this interface is designed as an low level interface for other user facing interfaces. One of which is iracepy
which I am currently working on.
I hope this helps!
Hi,
I agree with you completely. My only question is about the interface. If you look at the interfaces of paradox and ConfigSpace, they are very similar to what I have described above, where parameters are defined one by one. This would allow current readParameters to use the same interface to create the parameters object instead of having 2 functions in irace to create parameters that know the low-level details. Having such interface would also make much easier to move to paradox in the future.
Best,
Manuel.
I am not sure I understand what you mean. I see that getting the interface right is quite important, but do you mean instead of having list of five lists, each corresponding to a property, I should have a list of structs / objects where each object correspond to one line in the configuraiton? i.e. it is the transpose / zip of the data structure I currently have. To be honest, I prefer having serveral vectors of primitive data types instead of a list of objects. I find it easier to manipulate, uses less memory and is slightly faster. Also, although I haven't looked at how r2py
works, but I think it would be way easier to pass through the interface five lists of strings, numbers and expressions than a list of objects.
As to why there are two functions, I mostly just don't want to touch the readParameter
function lest I break something. Ideally, I think readParameter
should parse the text file into the data format that readParameterData
accepts and let it handle the rest, but I don't think I should do that now since it's quite a large refactor and it's easy to mess up.
Anyways, I am happy to change the interface to whatever you prefer. So please just let me know the specifications and I will make changes accordingly. Actually now that I think about, it would much prefer to keep this interface simple and primitive to maximise its comptablity with all sorts of binding with other languages and formats. As for compatablity with paradox, I think another function that coverts a paradox to this primitive format and back should be better.
What do you think?
I am not sure I understand what you mean. I see that getting the interface right is quite important, but do you mean instead of having list of five lists, each corresponding to a property, I should have a list of structs / objects where each object correspond to one line in the configuraiton? To be honest, I prefer having serveral vectors of primitive data types instead of a list of objects. I find it easier to manipulate, uses less memory and is slightly faster.
Not necessarily, currently the internals of the parameters "object" could stay as they currently are, but the way to create it should be less error-prone. It may look easy to create a list of five lists when you have less than 10 parameters but when you have 200, it becomes harder to track what element of the sublists correspond to the other elements of the sublists. Also, your example already has quite a bit of redundancy and default values that the user needs to track. Instead of:
d = list()
d$names <- c(
'initial_temp',
'restart_temp_ratio',
'visit',
'b',
'accept',
'no_local_search'
)
d$switches <- c(
'',
'',
'',
'',
'',
''
)
d$types <- c(
'r,log',
'r,log',
'r',
'r',
'r',
'o'
)
d$conditions <- list(
TRUE,
TRUE,
TRUE,
TRUE,
expression(no_local_search == "ccc"),
TRUE
)
d$domains <- list(
c(0.02, 5e4),
c(1e-5, 1),
c(1.001, 3),
expression(visit, visit + 10),
c(-1e3, -5),
c('', '1', 'ccc')
)
the user (or the Python code via rpy2) can write:
d = ParametersNew(p_real('initial_temp', 0.02, 5e4, transf="log"),
p_real('restart_temp_ratio', 1e-5, 1, transf="log"),
p_real('visit', 1.001, 3),
p_real('b', quote(visit), quote(visit + 10)),
p_real('accept', -1e3, -5, quote(no_local_search == "ccc")),
p_ord('no_local_search', '1', 'ccc'))
Also, although I haven't looked at how
r2py
works, but I think it would be way easier to pass through the interface five lists of strings, numbers and expressions than a list of objects.
Maybe yes, I have some doubts about it, specially passing expressions around, those will need to be passed as strings and parsed within irace (either by readParameterData
or by ParametersNew
). Calling R functions and passing numbers and strings is easy. Passing R objects through Python is also easy (Python calls an R function that returns something and gives this something to another R functions). But converting from R to Python and viceversa gets tricky.
But this is not what I'm proposing. My proposal is that iracepy (or whatever python code) will call the functions ParametersNew, p_real, p_ord, p_int, etc to create the object directly in R format.
As to why there are two functions, I mostly just don't want to touch the
readParameter
function lest I break something. Ideally, I thinkreadParameter
should parse the text file into the data format thatreadParameterData
accepts and let it handle the rest, but I don't think I should do that now since it's quite a large refactor and it's easy to mess up.
Fair enough, we can move readParameter
to use the new interface when it is working.
~Anyways, I am happy to change the interface to whatever you prefer. So please just let me know the specifications and I will make changes accordingly.~ Actually now that I think about, it would much prefer to keep this interface simple and primitive to maximise its comptablity with all sorts of binding with other languages and formats.
If you are happy with the interface that I propose above, then I would prefer something like that. You can even have a function in python that takes a Python list of lists (or whatever you want to write) and calls the R functions to create parameters R object.
Otherwise, another option that would not require touching irace at all and it is even simpler is to implement in python a function that converts from ConfigSpace to irace's textual format and then use the current readParameters
as it is. This would be quite useful in any case, so one can dump that file to a .txt and look at it.
I hope the above makes sense!
I agree. Your way of creating the data is much more user friendly and less error prone. But my idea is not for an end user to use this function, but it's more of an internal function. The code snipt is just for documenting the input format. In reality, a user should never need to write such code. In the documentation, I should point user to use readParameter
and whatever interface in iracepy
to create the parameter.
The function receives data from readParameter
and iracepy
, both of which would contain input validation. The internal representation in readParameter
is already quite similar to the current interface (the parameters
varaible), so it would not need to do any more work to use the readParaemterData
internal function.
But this is not what I'm proposing. My proposal is that iracepy (or whatever python code) will call the functions ParametersNew, p_real, p_ord, p_int, etc to create the object directly in R format.
I would like to counter propose that the python library creates paraemters with python objects and convert to the format that readParametersData
accepts. For example:
class IraceParameters:
...
def add_parmeter(self, type, switch, types, domains, conditions):
...
def export(self):
parameters = convert to the r object compatible with readParametersData
parameters = irace.readParametersData()
class Irace:
def run(self):
irace(scenario, parameters.export())
...
If you are happy with the interface that I propose above, then I would prefer something like that.
Is it possible that we have two functions? In addition to the current function, we have another function readParametersObject
(or some other clearer name) which takes input in the format you perfer and calls readParametersData
internally? It needs to convert the format anyways to use tansform.domain
, isFixed
anyways.
Otherwise, another option that would not require touching irace at all and it is even simpler is to implement in python a function that converts from ConfigSpace to irace's textual format and then use the current readParameters as it is. This would be quite useful in any case, so one can dump that file to a .txt and look at it.
I don't really like this idea because there are a lot of ways it can go wrong. What if the variable contains a special character such as space, quote or "|"? It's hacky at best and a secruity flaw at worst.
I agree. Your way of creating the data is much more user friendly and less error prone. But my idea is not for an end user to use this function, but it's more of an internal function. The code snipt is just for documenting the input format. In reality, a user should never need to write such code. In the documentation, I should point user to use
readParameter
and whatever interface iniracepy
to create the parameter.
If it is for internal use, then why need to pass a list of lists? Why not simply pass the list of names, switches, domains, etc? You also don't need to pass "n", since this can be inferred from the length of the list of names.
I would like to counter propose that the python library creates paraemters with python objects and convert to the format that
readParametersData
accepts. For example:
I really don't see how this is better because the python library has to create a representation that is not the representation used by the R code and the functions of the IraceParams class has to either duplicate the error checking done in the R code or do no error checking at all until .export() is called.
But I don't want to stop you, if you think this is better, I'm happy to accept this for your particular usecase. But first create the Python interface and see how it works with your proposed readParametersData
(without the list of lists, just pass the individual lists, and without the 'n' as that is simply the length of the names list) as that may lead you to update this PR.
Also, please use the parameters in the example of readParameters
as that exercises many of the features required. Once the Python code is working, I'm happy to accept this function (or however it ends up looking) with a note that it is for internal use only.
Is it possible that we have two functions? In addition to the current function, we have another function
readParametersObject
(or some other clearer name) which takes input in the format you perfer and callsreadParametersData
internally? It needs to convert the format anyways to usetansform.domain
,isFixed
anyways.
Perhaps my proposal was not clear. In my proposal readParametersData
will be the one calling parametersNew
and the other helper functions, because those functions will be the ones that contain all the logic and error-checking that builds the parameters
object as it is now. It doesn't make sense to have a function readParametersObject
: It is not useful for Python because it is not easy to convert between Python/R objects and it is not useful for R because you either want a text representation (which we already have) or a programmatic interface (which the proposed parametersNew
would provide). But as I said, I'm not going to stop you, I just ask that you get a complete solution working, including the Python part, and tested with a complex example, like the example provided with readParameters
and then I'm happy to accept this PR.
I don't really like this idea because there are a lot of ways it can go wrong. What if the variable contains a special character such as space, quote or "|"? It's hacky at best and a secruity flaw at worst.
There is no other way to parse a textual representation of the conditions unless a custom-made parser is written, which would be slower and a waste of time. Yes, in principle the user can write a condition that deletes your hard-drive. But they can also do that in the target-runner or worse, in something that is called by the target-runner and completely out of your view. If you are running a tuning scenario from a source you don't trust then something is very wrong already because you are asking irace to run user-provided code!
But as I said, if you manage to make the Python part work with a realistic example, I'm not opposed to adding this function even if you are the only user of it.
Thanks for all the advice. Yes, I think having a complete solution would be much benifitial. I will work on it in the next few days and I will come back to you. I will convert this PR to draft now and re-request your review when I'm ready. Thanks.
If it is for internal use, then why need to pass a list of lists? Why not simply pass the list of names, switches, domains, etc? You also don't need to pass "n", since this can be inferred from the length of the list of names.
That's a good idea. I have not thought of that.
I really don't see how this is better because the python library has to create a representation that is not the representation used by the R code and the functions of the IraceParams class has to either duplicate the error checking done in the R code or do no error checking at all until .export() is called.
I will try to implement error checking and see how it can be properly done.
Also, please use the parameters in the example of readParameters as that exercises many of the features required.
Sure.
In my proposal readParametersData will be the one calling parametersNew and the other helper functions, because those functions will be the ones that contain all the logic and error-checking that builds the parameters object as it is now. It doesn't make sense to have a function readParametersObject: It is not useful for Python because it is not easy to convert between Python/R objects and it is not useful for R because you either want a text representation (which we already have) or a programmatic interface (which the proposed parametersNew would provide).
I'm sorry I don't think I understand what you mean now. Perhaps when I have the python code it will be earsier for us to discuss about this and be on the same page.
There is no other way to parse a textual representation of the conditions unless a custom-made parser is written, which would be slower and a waste of time. Yes, in principle the user can write a condition that deletes your hard-drive. But they can also do that in the target-runner or worse, in something that is called by the target-runner and completely out of your view. If you are running a tuning scenario from a source you don't trust then something is very wrong already because you are asking irace to run user-provided code!
I see!
But as I said, if you manage to make the Python part work with a realistic example, I'm not opposed to adding this function even if you are the only user of it.
I hope I won't be the only user of this. Yes I am currently the only author (other than you) working on iracepy
which would benefit from this, but I hope other people would find it useful so this PR would indirectly benefit them.
The iracepy
interface is pretty much done at auto-optimization/iracepy#5.
Not sure why CI checks fail on macOS. Can you help with that?
What's the status of this? I am interested in this feature for my own Python wrapper iracepy-tiny
.
If possible, I would very much prefer an interface like paradox, as you have mentioned. My wrapper currently translates from a similar structure into irace's text format, which is then parsed again. It works, but it's kind of ugly.
What's the status of this? I am interested in this feature for my own Python wrapper
iracepy-tiny
.If possible, I would very much prefer an interface like paradox, as you have mentioned. My wrapper currently translates from a similar structure into irace's text format, which is then parsed again. It works, but it's kind of ugly.
Hi Jonathan!
There is a branch: https://github.com/MLopez-Ibanez/irace/tree/parametersNew implementing an interface like paradox. In fact, I would like to use paradox directly but there are a few unresolved questions about paradox.
The only reason I have not merged this branch into main is that I haven't had time to test it thoroughly, but the plan is to integrate it before the next irace release, with or without paradox. I would be happy to get feedback on the interface and whether something is missing or could be done better.
(an unrelated question: why fork iracepy? I would happily let you merge your improvements)
The new functions are at the bottom of https://github.com/MLopez-Ibanez/irace/blob/parametersNew/R/parameters.R
Hi,
Thanks for your interest in the feature.
The feature is working and I am just waiting for @MLopez-Ibanez approval. I was planning to use this on iracepy but I kind of just gave up on developing iracepy because there's a lot of problems with passing data through and parallel computing. I just think doing interprocess communication (e.g. though tcp, unix socket, or pipes) works a lot better than trying to bind r to python through embeded r (which is what rpy2 uses).
So I think at least for me, I am open for you to use this interface as is or change it as it fits for iracepy-tiny
packages, eventually @MLopez-Ibanez might approve this changes, but he might also just close it.
Sorry I probably don't have time to work on it this month or next (my university begins).
Rather than the approach in this PR, I'd rather use the interface proposed in: https://github.com/MLopez-Ibanez/irace/blob/parametersNew/R/parameters.R
Rather than the approach in this PR, I'd rather use the interface proposed in: https://github.com/MLopez-Ibanez/irace/blob/parametersNew/R/parameters.R
That looks exactly like what I want, thank you. I'll try to integrate it, and let you know of the result.
(an unrelated question: why fork iracepy? I would happily let you merge your improvements)
iracepy-tiny
was originally only meant as a stepping stone for an irace
Rust wrapper, so I'm only planning support for exactly the features that I need. I'm still experimenting with the interface right now, but if it's more stable I would be happy to integrate the changes back into iracepy
, if there's demand.
This PR adds functionality to parse parameters from an R list (as opposed to a string). This would be useful to build a user friendly native object interface with
iracepy
where user does't have to format a string themselves.Summary of changes:
readParametersData
transform.domain
andtreeLevel
out of the functionreadParameters
. Caveat: this might lead to name clashes as these function are no longer scoped within thereadParameters
function, but I think this is mostly okay because there aren't any other variable with the names and they are unique enough that it's unlikely new variables would be named that.Things that can be improved:
readParametersData
. Ideally those should be moved into functions be called in the two places, but I thought it's more or less fine because the function of the code block is quite simple and it's unlikely something is going to go wrong.