chengsoonong / acton

Active Learning: Predictors, Recommenders and Labellers
BSD 3-Clause "New" or "Revised" License
20 stars 5 forks source link

Add command-line interface for components #70

Closed MatthewJA closed 7 years ago

MatthewJA commented 7 years ago

I'll plan out what the interface should look like, starting with predictors since we already have protobufs for that.

acton-predict
    --labels labels.pb
    --predictor LogisticRegression
    --output predictions.pb

Input will be a serialised Labels protobuf to train on. Predictions will be made for all instances in the associated database.

Output will be a serialised Predictions protobuf.

Incidentally, this raises a question. Do we want to expose protobufs in this way? Or do we want to specify the whole data file here just like the current acton interface?

MatthewJA commented 7 years ago

Another option would be something like

acton-predict
    --data data.dat
    --feature feature_column
    --label label_column
    --training-set indices.txt
    --predictor LogisticRegression
    --output predictions.pb

so you could drop your data right in without having Acton pre-process it first. It's a bit messier and the output is still (necessarily) a protobuf.

I'll use this option for now, I think.

MatthewJA commented 7 years ago

The recommender is a lot simpler.

acton-recommend
    --predictions predictions.pb
    --recommendation-set indices.txt
    --recommender UncertaintyRecommender
    --recommendation-count 1
    --diversity 0.1

I'd be very happy for this to output to stdout.

MatthewJA commented 7 years ago

Another option for acton-predict is that it can take either the arguments directly or the protobuf, and then just construct a protobuf from there if need be.

MatthewJA commented 7 years ago

@chengsoonong: Note that if you want to run the CLIs I'm adding now, you'll need to reinstall Acton since the new CLIs are added on install.

pip3 install -e . will work for that.

chengsoonong commented 7 years ago

My advice is to begin at the desired end product.

The way to see whether these CLI are elegant or not is to write the bash script that reproduces simulate_active_learning

MatthewJA commented 7 years ago

Something like this would work and looks alright:

for (( ; ; )); do
    acton-predict --labels labels.pb -o predictions.pb
    acton-recommend --predictions predictions.pb | acton-label -o labels.pb
done

Though there's no entry points for accessory information (database etc.) here. We could also try:

for (( ; ; )); do
    acton-predict --labels labels.pb -o predictions.pb
    acton-recommend --predictions predictions.pb -o recommendations.pb
    acton-label --recommendations recommendations.pb -o labels.pb
done

and carry accessory information around in protobufs. This has the advantage of being really consistent-looking. We could even pipe protobufs directly between processes if no output is specified:

for (( ; ; )); do
    acton-predict < labels.pb | acton-recommend | acton-label > labels.pb
done

That's kinda nice in a way. You could then also smoothly output things like recommendations to stdout, e.g.

acton-predict < labels.pb | acton-recommend

would output a list of recommendations based on the predictions (which seems useful — maybe we then go and make some measurements in the field).

(The more I think about this, the more I think auxiliary information should be stored in protobufs — no sense in specifying the database every time. I'm going to (re)write some protobufs with this in mind.)

MatthewJA commented 7 years ago

Here's what using acton-label looks like:

$ echo "1
> 2
> 3
> 4
> 5
> 6
> " | acton-label --data tests/data/classification.txt -l col20
WARNING:root:Not implemented: labeller_accuracy
{"db":{"className":"ASCIIReader","kwarg":[{"value":"\"\"","key":"feature_cols"},{"value":"\"col20\"","key":"label_col"}],"path":"tests/data/classification.txt"},"id":["1","2","3","4","5","6"]}

It currently outputs human-readable JSON, but I'm changing that to binary data.

MatthewJA commented 7 years ago

acton-predict is also (basically) done (f60d17e) and works like this:

acton-label --data tests/data/classification.txt -l col20 | acton-predict > predictions.pb
MatthewJA commented 7 years ago

acton-recommend in a24a55b80256957afbf25afc148c9268bf348d27.

MatthewJA commented 7 years ago

Just have to tidy up the edges (acton-recommend | acton-label) and add a test of some sort.

MatthewJA commented 7 years ago

Finished the loop in e73678ed81a2297fa8cfa81ba1309632e704f63b. Still have to fix up some code and write a test.