Domains docu - Githubissues

Osburg commented 4 months ago

@KappatC https://www.rdkit.org/docs/GettingStartedInPython.html

Osburg commented 4 months ago

NonlinearInequalityConstraint(expression="x12 + x22 - x3", features=["x1","x2","x3"])

R-M-Lee commented 4 months ago

Hi @KappatC and @Osburg, is this ready for review? I am happy to be the reviewer when needed. Otherwise we should convert to draft

KappatC commented 4 months ago

Imo there are a few things to ask Johannes, but apart from this should be ok. @Osburg what do you say?

Hi @KappatC and @Osburg, is this ready for review? I am happy to be the reviewer when needed. Otherwise we should convert to draft

Imo there are a few things to ask @jduerholt, but apart from this should be ok. @Osburg what do you say?

R-M-Lee commented 4 months ago

we can ask questions here... Johannes will have been pinged when you mentioned him just now anyway

jduerholt commented 4 months ago

Just shoot your question ;)

Osburg commented 4 months ago

Hi @jduerholt :) Yes, we still had a few questions:

I think both of us never used descriptor inputs (as in ContinuousDescriptorInput and CategoricalDescriptorInput). What are these for? Or can you give us a reference to an explanatation?
What is the purpose of TaskInputs?
We've seen that CloseToTargetObjectives seem to be suitable for multiobjective strategies, while TargetObjectives are not. What is the difference between them (apart from their different implementations of __call__())?

@KappatC Did I forget anything? @jduerholt if it is easier for you to just complete the missing parts yourself, this is fine for me as well. But an explanation is appreciated so that we know better in the future.

Cheers Aaron

KappatC commented 4 months ago

Hi @jduerholt :) Yes, we still had a few questions:
* I think both of us never used descriptor inputs (as in `ContinuousDescriptorInput` and `CategoricalDescriptorInput`). What are these for? Or can you give us a reference to an explanatation?

* What is the purpose of `TaskInputs`?

* We've seen that `CloseToTargetObjective`s seem to be suitable for multiobjective strategies, while `TargetObjective`s are not. What is the difference between them (apart from their different implementations of `__call__()`)?
@KappatC Did I forget anything? @jduerholt if it is easier for you to just complete the missing parts yourself, this is fine for me as well. But an explanation is appreciated so that we know better in the future.

Cheers Aaron

Thanks @Osburg for summarizing. Yes that should be it, they are also marked as todos in the file (they are the only ones except from adding some links to the rest of the docu once we have everything). Maybe one more general thing @jduerholt is double checking that the list of inputs/objectives is complete and that they are all ready to be used :)

jduerholt commented 4 months ago

Hi @KappatC and @Osburg,

regarding your questions:

CategoricalDescriptorInput: Imagine having a categorial input with for example 10 different categories and let's say that every category corresponds to a specific material. Via the CategoricalDescriptorInput one can provide it with continuous encodings for the different categories via so called descriptors. In our example with the ten different materials, the descriptors could be for example density and hardness. Every material/category would get assigned a number for density and hardness in the hope that these two properties describe the material properly. In the context of fitting a GP, one can then use just these two dimensional vector for describing the material instead of a ten dimensional one-hot encoding, which results in a dimensionality reduction. Of course, this makes only sense of the descriptors actually correlate with the desired quantities.
ContinuousDescriptorInput: Ignore it, is used nowhere and I am still not sure if we will ever use it. Maybe we should also just remove it and add it again when it is really used to not confuse people. What do you think?
TaskInput: Should be used for MultiTaskGPs and MultiFidelityGPs. Currently under implementation here: https://github.com/experimental-design/bofire/pull/353. You can also find more about why it is implemented as it is implemented in this PR.
CloseToTargetObjective and TargetObjective: CloseToTargetObjective actually measures the difference to the target value which is something which makes sense to minimize in a true multiobjective optimization and to include in the pareto front, whereas TargetObjective is of type ConstrainedObjective as MaximizeSigmoid or Minimize, so it get 1 if the value is in the target region and falls asymptorically agains zero outside the target region.
Note that also the objectives of type ConstrainedObjective can be used in multiobjective optimization but you need at least two targets of type Minimize, Maximize or CloseToTarget.

I hope this helps! If you need more, just ask again. If ok, for you, I would prefert that you finish it and I review/add/modify in the end. Ok for you?

Best,

Johannes

KappatC commented 3 months ago

Hi @KappatC and @Osburg,

regarding your questions:

* `CategoricalDescriptorInput`: Imagine having a categorial input with for example 10 different categories and let's say that every category corresponds to a specific material. Via the `CategoricalDescriptorInput` one can provide it with continuous encodings for the different categories via so called descriptors. In our example with the ten different materials, the descriptors could be for example `density` and `hardness`. Every material/category would get assigned a number for `density` and `hardness` in the hope that these two properties describe the material properly. In the context of fitting a GP, one can then use just these two dimensional vector for describing the material instead of a ten dimensional one-hot encoding, which results in a dimensionality reduction. Of course, this makes only sense of the descriptors actually correlate with the desired quantities.

* `ContinuousDescriptorInput`: Ignore it, is used nowhere and I am still not sure if we will ever use it. Maybe we should also just remove it and add it again when it is really used to not confuse people. What do you think?

* `TaskInput`: Should be used for `MultiTaskGP`s and `MultiFidelityGP`s. Currently under implementation here: [Initial attempt to incorporate MultiTask GPs #353](https://github.com/experimental-design/bofire/pull/353). You can also find more about why it is implemented as it is implemented in this PR.

* `CloseToTargetObjective` and `TargetObjective`: `CloseToTargetObjective` actually measures the difference to the target value which is something which makes sense to minimize in a true multiobjective optimization and to include in the pareto front, whereas `TargetObjective` is of type `ConstrainedObjective` as `MaximizeSigmoid` or `Minimize`, so it get 1 if the value is in the target region and falls asymptorically agains zero outside the target region.

* Note that also the objectives of type `ConstrainedObjective` can be used in multiobjective optimization but you need at least two targets of type `Minimize`, `Maximize` or `CloseToTarget`.

I hope this helps! If you need more, just ask again. If ok, for you, I would prefert that you finish it and I review/add/modify in the end. Ok for you?

Best,

Johannes

Hey @jduerholt, thanks for the explanations. I tried to adjust the text accordignly, but feel free to make any further changes. A few remarks/todo's left:

A link to the strategy docu is missing. I kept it as a todo in the text cause I am unsure of the status there.
I read the thread with Jose's implementation/comments for the TaskInputs. Not sure if this is sth used atm. There is another todo in the text at that point, I am not feeling confident explaining this, if you could do it, would be great if not I d simply leave it out for now.
Could you please double check that the example with CategoricalDescriptorInput is correct?
I think I now understand the difference in the implementation CloseToTarget Objective and TargetObjective, so thanks for explaining. Tbh, conceptually I am still sceptical about whether we need to differentiate between the two, but that’s probably another topic. I tried to keep the text close to your explanation, but feel free to change things whenever you think is appropriate.

I hope i did not miss anything. Apart from this the rest imo should be good to go :)

Best, Chryssa

jduerholt commented 3 months ago

Hi @KappatC,

just leave the strategies doc as todo, I will take care for it at some point.
just leave it out for now, we will add it when we really start using it.
looks good to me
looks also good to me

If you wonder why the tests are failing, we are testing the code snippets in the documentation, it seems that some snippets are ill formatted or buggy. If you have substantial problems there, just tell me, then I have a look.

Best,

Johannes

experimental-design / bofire

Domains docu #369