Closed MatthewJA closed 7 years ago
Another option would be something like
acton-predict
--data data.dat
--feature feature_column
--label label_column
--training-set indices.txt
--predictor LogisticRegression
--output predictions.pb
so you could drop your data right in without having Acton pre-process it first. It's a bit messier and the output is still (necessarily) a protobuf.
I'll use this option for now, I think.
The recommender is a lot simpler.
acton-recommend
--predictions predictions.pb
--recommendation-set indices.txt
--recommender UncertaintyRecommender
--recommendation-count 1
--diversity 0.1
I'd be very happy for this to output to stdout.
Another option for acton-predict
is that it can take either the arguments directly or the protobuf, and then just construct a protobuf from there if need be.
@chengsoonong: Note that if you want to run the CLIs I'm adding now, you'll need to reinstall Acton since the new CLIs are added on install.
pip3 install -e .
will work for that.
My advice is to begin at the desired end product.
The way to see whether these CLI are elegant or not is to write the bash script that reproduces simulate_active_learning
Something like this would work and looks alright:
for (( ; ; )); do
acton-predict --labels labels.pb -o predictions.pb
acton-recommend --predictions predictions.pb | acton-label -o labels.pb
done
Though there's no entry points for accessory information (database etc.) here. We could also try:
for (( ; ; )); do
acton-predict --labels labels.pb -o predictions.pb
acton-recommend --predictions predictions.pb -o recommendations.pb
acton-label --recommendations recommendations.pb -o labels.pb
done
and carry accessory information around in protobufs. This has the advantage of being really consistent-looking. We could even pipe protobufs directly between processes if no output is specified:
for (( ; ; )); do
acton-predict < labels.pb | acton-recommend | acton-label > labels.pb
done
That's kinda nice in a way. You could then also smoothly output things like recommendations to stdout, e.g.
acton-predict < labels.pb | acton-recommend
would output a list of recommendations based on the predictions (which seems useful — maybe we then go and make some measurements in the field).
(The more I think about this, the more I think auxiliary information should be stored in protobufs — no sense in specifying the database every time. I'm going to (re)write some protobufs with this in mind.)
Here's what using acton-label
looks like:
$ echo "1
> 2
> 3
> 4
> 5
> 6
> " | acton-label --data tests/data/classification.txt -l col20
WARNING:root:Not implemented: labeller_accuracy
{"db":{"className":"ASCIIReader","kwarg":[{"value":"\"\"","key":"feature_cols"},{"value":"\"col20\"","key":"label_col"}],"path":"tests/data/classification.txt"},"id":["1","2","3","4","5","6"]}
It currently outputs human-readable JSON, but I'm changing that to binary data.
acton-predict
is also (basically) done (f60d17e) and works like this:
acton-label --data tests/data/classification.txt -l col20 | acton-predict > predictions.pb
acton-recommend
in a24a55b80256957afbf25afc148c9268bf348d27.
Just have to tidy up the edges (acton-recommend | acton-label
) and add a test of some sort.
Finished the loop in e73678ed81a2297fa8cfa81ba1309632e704f63b. Still have to fix up some code and write a test.
I'll plan out what the interface should look like, starting with predictors since we already have protobufs for that.
Input will be a serialised Labels protobuf to train on. Predictions will be made for all instances in the associated database.
Output will be a serialised Predictions protobuf.
Incidentally, this raises a question. Do we want to expose protobufs in this way? Or do we want to specify the whole data file here just like the current
acton
interface?