LTR4L / ltr4l

Learning-to-Rank for Apache Lucene
Apache License 2.0
33 stars 10 forks source link

No way to specify idField when extracting features #276

Open kharalast5 opened 6 years ago

kharalast5 commented 6 years ago

There seems no way to specify idField when extracting features at calling CMQueries constructor. Since idField is fixed in the src as "id", we can't use the schema in which the unique key field is defined other than "id", for exsample, "url" in case of the livedoor news corpus. I think the idField name could be passed vi command line or feature config.

Please take a look at the following and check.

  public static class CMQueries {
    public String idField;
    public List<CMQuery> queries;

    public CMQueries(Map<String, Map<String, Float>> clickRates){
      idField = "id"; // ->    idField = "url";

Thanks.

yasufumi0410 commented 6 years ago

Yes, you are right. I'll provide the methods that can point an id field via its parameters.

kharalast5 commented 6 years ago

I'm sorry to say, adding the param on constructor of CMQueries doesn't help for me. I'm using your command line interface, FeatureExtract with your batch shell.

I think the idField name could be passed vi command line or feature config.

Thinking about the overall design, I can't find good reason which idField is posted from command line tool to solr server side, because solr server side is the one who manage its idField name. Can we get a "uniqueKey" schema definition via Solr's API? Otherwise, I think idField definition could be in ltr_features.json, like the following, and read by solr server side.

{
    "idField": "url",
    "features": [
    {
      "name": "TF in title",
      "class": "org.ltr4l.lucene.solr.server.FieldFeatureTFExtractorFactory",
      "params": { "field": "title" }
    },

Thank you.

yasufumi0410 commented 6 years ago

Hmm. For the present, I prepare the methods having id field param. In the future, I will implement the feature which reads uniqueKey value from managed-schema as id field.

yasufumi0410 commented 6 years ago

@kharal5 Now, you can choose id field in FeatureExtract when setting 6th parameter.