fmarten / JoSimText

A system for word sense induction and disambiguation based on JoBimText approach
0 stars 0 forks source link

Enhance configuration mechanics #12

Open fmarten opened 7 years ago

fmarten commented 7 years ago

The objectives could be:

It seems reasonable to attack this early. First because attacking it late will leave us no time to have real world experience with changes. Currently there is already a lot of boilerplate and continuing forces us to create even more, to stay consistent.

fmarten commented 7 years ago

This project might be good for inspiration: https://github.com/apache/systemml

fmarten commented 7 years ago

There seems to be three intersting places for how to create scripts that provide a good entry point into Spark.

I am not yet sure how they fit together, though.

But you can see that they favor you solution with explicitly passing Spark configuration, such as --driver-memory. What I do not like is how they have hard coded the defaults.

What I like is that they have a single entry point.

And I have a suggestion how this would be possible in our situation, even with the concerns you have mentioned (having a fast starting point for researchers with an overview of all model params and no need to write them donw manually). The solution could be extracting the model params to an extra key-value file and then solely providing this key-value file for each "method". The nice part about this idea is, that we can later regenerate such a file and include it into the output folder. (That is by the way similar to what Spark does within MLLibs model persistence.)

My main point is, that a single entry point, would reduce boilerplate and make it easier to resolve issues in the scripts.

alexanderpanchenko commented 7 years ago

But you can see that they favor you solution with explicitly passing Spark configuration, such as --driver-memory.

I thought again about this and ready to say that I am very much in favor of such explicit setting spark, not via env. vars.

What I do not like is how they have hard coded the defaults.

yeah, what we do now seems to be even more advanced

create scripts that provide a good entry point into Spark.

my main bias is to make the scripts as simple as possible, which is not really the case in this project. i want them ideally to have no while or for loops, no functions, and as little ifs as possible, so even a kid (=researcher) can read such bash script. in this project, the scripts are quite complex.

The nice part about this idea is, that we can later regenerate such a file and include it into the output folder. (That is by the way similar to what Spark does within MLLibs model persistence.)

I strongly oppose "later" thing. If it is a benefit, then we need to do it now or do not even consider it. For me actually it is not clear how you will do it. Though reflection?

Please answer: which problem you are trying to solve by changing the configuration? Please answer this question very clearly and with as much details as possible. For now, I cannot really understand it and this is very important.