germanattanasio / professor-languo

Professor Languo is a sample app that uses the Retrieve and Rank service to answer questions about the english language. It showcases a question and answer app built on the Retrieve and Rank Bluemix service
https://professor-languo.mybluemix.net/
13 stars 18 forks source link

Ranker Issue #6

Closed poonamsaini17 closed 8 years ago

poonamsaini17 commented 8 years ago

I am trying to create a ranker . I am getting this error:

main] c.i.w.d.service.WatsonService : https://gateway.watsonplatform.net/retrieve-and-ra nk/api/v1/rankers, status: 400, error: Entitlement error Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 189.698 sec <<< FAILURE! - in

com.ibm.watson.developer_cloud.service.BadRequestException: Entitlement error at com.ibm.watson.developer_cloud.service.WatsonService.execute(WatsonService.java:141)

dgterry commented 8 years ago

@poonamsaini17 hello, I spoke to a colleague about this and the most likely case this would occur is if the number of allowed rankers has been exceeded. For example, the free version of the service only allows one ranker at a time.

Could you try to list the rankers you do have and provide us with that information? You can find out how to do that here: http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/retrieve-and-rank/api/v1/?curl#get_rankers

If you have an existing ranker you don't need you could delete it and then try to create the failing one again. Here is how you can delete rankers: http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/retrieve-and-rank/api/v1/?curl#delete_ranker

poonamsaini17 commented 8 years ago

I know this thing when this error occurs.. I have seen this error in case of classifier , clusters . and i know it usually occurs when it crosses threshold . Here i have not even created any ranker for the particular credentials provided ... Before reaching to method called Ranker , its giving me error.. And i am not using CURL. I am using java code.

Can you provide suggestion over that ?

stevenoh93 commented 8 years ago

@poonamsaini17 Could you be more specific on where the error is occurring? Which class and line number perhaps? Was this during runtime or during building? And also, could you provide us the entire stack trace?

Thanks, Steven

poonamsaini17 commented 8 years ago

@dgterry Thanks ! It was my bad . I tried getting all the rankers and it was generating the rankers. As you mentioned it crossed rankers limit , so the error was ...

@stevenoh93 I resolved the issue. Thanks to all !

poonamsaini17 commented 8 years ago

I am getting the Ranker_status : failed

{"ranker_id":"3b140ax15-rank-1936","name":"ranker","created":"2016-05-18T09:21:37.794Z","url":"https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/rankers/3b140ax15-rank-1936","status":"Failed","status_description":"Error encountered during training: Training data quality standards not met: invalid header (duplicate feature names). Row 1 of input data

training_data_copy.xlsx

stevenoh93 commented 8 years ago

Hmm your training data doesn't look right.. Did you get this from running the PipelineDriver.java?

poonamsaini17 commented 8 years ago

no i am not touching the code of "professor ...app" . I am creating my own using java . This training data file i created manually . Here when i worked today . I could resolve this issue ( id doesnt accept the characters which i have mentioned ) so i passed integer value . And then i tried creating ranker and status is failed again with the following message: {"ranker_id":"3b140ax14-rank-2056","name":"ranker","created":"2016-05-19T09:33:57.747Z","url":"https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/rankers/3b140ax14-rank-2056","status":"Failed","status_description":"Error encountered during training: Training Data Quality Standards Not Met: detected 75 out of total 75 queries as having no label variety. This exceeds the maximum allowed of 57 (75%)."}

In this case i am confused ,, shall i pass multiple ids to the single question with different relevancy(label)?

Label is relevancy , right?

stevenoh93 commented 8 years ago

The formatting of the training_data.csv file has to be precise in order for the ranker to be trained successfully. Looking at the sample attached, there are more errors than just the labels.

You'll want to check this link if you want to prepare your own training data file, but I have to say it can be a tedious process. We have a python script (train.py) that pretty much does what PipelieDriver.java does, but it's packaged as a script so you can easily run it.

More details on how to run the script are in the documentation link I attached above.

poonamsaini17 commented 8 years ago

Thanks @stevenoh93 ! I have written a new training_data.csv file . I will share that file by tomo . Yeah its a tedious task . I have tried to follow the rules which are mentioned in the link ...

poonamsaini17 commented 8 years ago

error 1: {"ranker_id":"3b140ax14-rank-2100","name":"chatbot_ranker","created":"2016-05-20T05:44:54.405Z","url":"https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/rankers/3b140ax14-rank-2100","status":"Failed","status_description":"Error encountered during training: Training data quality standards not met: invalid header (duplicate feature names). Row 1 of input data."}

error 2: {"ranker_id":"3b140ax14-rank-2063","name":"chatbot_ranker","created":"2016-05-19T12:51:49.933Z","url":"https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/rankers/3b140ax14-rank-2063","status":"Failed","status_description":"Error encountered during training: Training Data Quality Standards Not Met: detected 75 out of total 75 queries as having no label variety. This exceeds the maximum allowed of 56 (75%)."}

Can i get explanation of these two errors?

stevenoh93 commented 8 years ago

These errors look the same as the one you got previously. The first error means that you do not have a valid header. The training data csv file needs the first row to be the header:

id, feature0, feature1, ... , label

The second one means that you are not meeting this requirement specified in the doc:

At least two different relevance labels must exist in the data and those labels must be well represented. A label is well represented if it occurs at least once for every 100 unique questions.

I'm not sure what your training data is, but in the case of stackexchange data, we consider two labels: 0 for the candidate answers that were not chosen by the user asking the question, and 1 for the one chosen. If you only have a single value labeled to all questions in your training data, you'll get this error.

There are a lot of requirements for the training data in the docs. That's why we recommend using our services, since the requirements are hard to meet if done by hand.

poonamsaini17 commented 8 years ago

can anyone please explain what is this error and when does it appear ? This issue has been a pain .. I am not sure when retrieve and rank will work properly .

"status":"Failed","status_description":"Error encountered during training: Training data quality standards not met: invalid header (duplicate feature names). Row 1 of input data."

This time i have taken the data from here rather creating my own https://github.com/cfsworkload/watson-conversation/tree/master/r-and-r

dgterry commented 8 years ago

@poonamsaini17 I see you filed an issue against the repo you mentioned, I think that is the right thing to do. I also asked for some assistance from the retrieve and rank team and they suggested that you head to https://developer.ibm.com/answers/topics/retrieve-and-rank/ where you should be able to search for existing solutions as well as ask a new question if necessary. The retrieve and rank developers monitor that forum and can hopefully help you with your ranker trouble.

poonamsaini17 commented 8 years ago

@dgterry Thanks for reply ! Yeah Hopefully i will get a solution there !

pranireddy9 commented 8 years ago

@poonamsaini17 : can you please share your code that where exactly and how you are creating the rank ? I am stuck at the point where java can not create a succefull rank for me and its happening with the python and I am using cranfield data for that.. I appreciate your response..

poonamsaini17 commented 8 years ago

@pranireddy9 You need to write python code ( which includes header ) into java and then call "create Ranker() method..

What ever code is given java-sdk doesnt do the work which train.py does, so you have to manually write and call ..

pranireddy9 commented 8 years ago

Thank you for your quick response..Do you have any idea what things I have to add to cranfield csv file to get my rank status succesfull, Can you please tell me what to add in the header section cranfield csv file ..I am not understanding python code..

stevenoh93 commented 8 years ago

@pranireddy9 This sample app was designed to take StackExchange data as input, so it won't work with the cranfield data. To successfully create a ranker using the cranfield data, you'll have to follow the steps in the tutorial using curl commands.

If you want to compile a Java application that can create a ranker using the cranfield data, you can follow the API reference and our code to create your own application. The python code that @poonamsaini17 mentioned is already converted to Java in our app here, and it will generate headers for you.

areddy7021 commented 8 years ago

@stevenoh93: i am writing on behalf of @pranireddy9 has to use the util class you mentioned to convert the cranfield csv to header added csv then she should be able to create the the ranker using java. you specified our code to create your own application ..so do i need to use this code or uril class you mentioned.

stevenoh93 commented 8 years ago

@areddy7021 of course you don't have to use our code, but we just didn't want you to do duplicate work. The python script will create the trainingdata.csv file with headers added, and you can train the ranker using curl commands just like the tutorial. I was saying if you wanted to do this in Java, you'd have to make your own application, but some of the work is already done by us and is included in this app, including converting a ground truth csv file to the training data file.

poonamsaini17 commented 8 years ago

@pranireddy9 yeah the same header code i was talking about.. Thanks @stevenoh93 .

I hope you consider adding it to the java-sdk as well..

areddy7021 commented 8 years ago

thanks @stevenoh93 and @poonamsaini17 this really helps ..probably they have to specify this additional step in the tutorial as well to avoid the confusion.

poonamsaini17 commented 8 years ago

@areddy7021 - i just checked they have updated java-sdk. Follow that its much simpler than this app. @stevenoh93 . its my bad. I checked few min but java-sdk was updated 5 hours ago :P ..

pranireddy9 commented 8 years ago

Thank you so much @poonamsaini17 and @stevenoh93

pranireddy9 commented 8 years ago

I just checked they haven't written any create rank in the example. If you have can you please share that? or do i need to update the sdk version in the maven?

areddy7021 commented 8 years ago

@poonamsaini17 do we need to put the java sdk version as 3.0.1 to get that util function ..we were stuck at the place where we can not create a successfull rank with java ..i can do successfull rank with python with curl. I am just worried about java now and what i am confused now is i have seen the util class what @stevenoh93 mentioned ..there is no method where it takes the csv file as input and convert to the training file. Please point me to the right solution where it can be straight forward.

areddy7021 commented 8 years ago

if that's the part of java sdk then for us its better to have service call to invoke the createranker method.

pranireddy9 commented 8 years ago

@poonamsaini17 @stevenoh93 i a having a very bad phase of dealing the creation of ranker using java even with the new sdk. here is the output : {"ranker_id":"3b140ax14-rank-3047","name":"ranker_new_1","created":"2016-06-09T01:52:32.472Z","url":"https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/rankers/3b140ax14-rank-3047","status":"Failed","status_description":"Error encountered during training: Training data quality standards not met: invalid header (duplicate feature names). Row 1 of input data."}

I have tried with the so many attempts and i am using cranfield_gt.csv file. Now i am tired of doing this ..please help me out. and here is my small code snippet

URL url = CerebriRetrieveAndRank.class.getClassLoader().getResource(rankercsv);
    File rankerCsv=null;
    ServiceCall<Ranker> ranker = null;
    try {
        rankerCsv = new File(url.toURI());
    } catch (URISyntaxException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

ranker = service.createRanker("ranker_new_1", rankerCsv);
areddy7021 commented 8 years ago

@pranireddy9 i am very sorry to hear that ..i am facing the same issue . So i have looked into the @stevenoh93 code of util class ..and i have imported the project to look after how that class util methods are getting referred but no where in the project is not using any csv file to create a ranker. I am surprised and had a look into other options too to create a ranker with java with the new java sdk and still there is no luck . @stevenoh93 you need to solve this puzzle to get us understanding ..thank you very much for your help . i am just trying or expecting the straight forward solution to create the ranker using java. @poonamsaini17 can you give a sample snippet of creating a ranker with the new java sdk ..which is successful from your end.

I am not sure whether it wokred any time or not with the java ..i dont want to get into python or something else..when i am using java and i need the rank creation with java too as IBM mentioned in the documentation and as a developer i was expecting a simple thing from IBM where i can use their service.

poonamsaini17 commented 8 years ago

@pranireddy9 and @areddy7021 : https://github.com/watson-developer-cloud/java-sdk/tree/master/src/main/java/com/ibm/watson/developer_cloud/retrieve_and_rank/v1

I was talking about this.

1) when you call ranker = service.createRanker("ranker_new_1", rankerCsv);

Here rankerCsv is not the file which you have prepared training data . it is file generated when you pass your rankerCsv to train.py and gets trained data for ranker and pass it to createRanker(). If you keep on passing this csv file which have mentioned with your data in this method , you will never get rid of this issue because its not the correct format .

@stevenoh93 @dgterry : Never mind, I guess update in documentation is required . I have seen many queries related to such . People are getting confused and facing this issue .

areddy7021 commented 8 years ago

@poonamsaini17 so i have to pass the my csv file pythin script and get the trained file and then create a ranker..but how do i call to python script from java ..why cant we embed the same code in java what python is doing ..is there any example which we can see.

poonamsaini17 commented 8 years ago

@areddy7021 Same has been done in that . please refer the link . https://github.com/watson-developer-cloud/java-sdk/tree/master/src/main/java/com/ibm/watson/developer_cloud/retrieve_and_rank/v1

areddy7021 commented 8 years ago

I have seen that but i am not seeing no where in any of the method parameter as a file and parsing logic or adding header.

poonamsaini17 commented 8 years ago

check the complete package ..

areddy7021 commented 8 years ago

i am seeing the zip utils and there is a method called create ranker but its not doing anything with the csv file..l am not able to find the csv file parsing logic to convert to training file.

is that java sdk 3.x is not covering this issue ?

stevenoh93 commented 8 years ago

@areddy7021 To my understanding, you're looking for a consolidated solution in Java that can take the ground truth csv file as input and ultimately create and train the ranker? Then you are correct in that the java-sdk repository does not offer what you are looking for. We currently have no plans on supporting such functionality in the java-sdk, given the python script does the job.

We tried to do something similar in this app by using our CandidateAnswer model, but this code involves our QuestionAnswerer pipeline and is low-level csv parsing. We want our users to steer away from such mess.

Could you perhaps explain why you wouldn't want to use the Python script to generate the training data?

areddy7021 commented 8 years ago

@stevenoh93 thank you. So my use case is we are developing the application in java and we are inetgrating our application with some content management solution in the front end by integrating the watson at backend to get the retrieve and rank and personality insights etc. So i have started following the IBM documentation to create solr cluster and config fiel uploading ..everything seems to be fine until i reach the step of csv file. I couldn't create the successfull rank because of some parsing or bad csv file . But at this point because of this step failure we do not want to switch to python for only one step of ranker creation.

we are not including python in anywhere else in our organization and because of this step if we want to include python probably that wont make sense. it's very pain for the developer for one step of ranker he has make curl request with the python script and make a rank creation.

But if i do want to invoke the same python script from java ? can i do that ?

we are a java shop and we do not want to go away from java to python for a single step ..and all we want is everything seems to be fine and it looks much great of we create the ranker with java.

All this is becoming complex now if we are missing the step of ranker creation in java. I can't give the same explanantion to my peers saying that IBM wont support the java ranker creation instead we have to go with the python script to get it done.

stevenoh93 commented 8 years ago

@areddy7021 Could you elaborate on the failing part? Are you sure you are using the training data and not the ground truth (relevance file)? If you believe that your data will change frequently overtime and feel the need to automatically update the ranker, then you'll have to create your own Java version of the Python script (which isn't too complicated. In the end, it is just compiling http responses to a csv). However, having the need to frequently update the ranker is rarely true, so I suggest running that one line of code on the console to execute the python script and don't look back :)

areddy7021 commented 8 years ago

the failing part as i said ..i am using the cranfield data across my application as a POC . i agree with you running the one line of code always through python but the thing is why this one line of code to be executed some where else other than in my java program. since i am creating a cluster through java , config through java , collection creation through java , getting ranker status and many more elase through java except ranker creation. Here is the output when i creat ethrough java and using sdk 3.x

here is the output : {"ranker_id":"3b140ax14-rank-3047","name":"ranker_new_1","created":"2016-06-09T01:52:32.472Z","url":"https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/rankers/3b140ax14-rank-3047","status":"Failed","status_description":"Error encountered during training: Training data quality standards not met: invalid header (duplicate feature names). Row 1 of input data."}

I am ok with your approach ..but doesn't it look like weird when i say to other people that : please remember the ranker creation should happen only through python please do not use java for the ranker creation ..and in the documentation i have to specify that ..ranker creation not supported in java.

using cranfiled_gt.csv file and the same file i have used in python curl command to create the ranker ..that was successful.

stevenoh93 commented 8 years ago

I believe this failure is caused by using the wrong csv file. You must use the output csv of the Python script to train the ranker. We do not have plans to support creating training data through Java, but you can certainly "port" over the Python script to Java if you want.

areddy7021 commented 8 years ago

ok..so will have to use the ranker step using python as i did in the last time.

stevenoh93 commented 8 years ago

@areddy7021 one alternative approach to what you are doing is to seek some further help on the Java path at https://developer.ibm.com/answers/topics/retrieve-and-rank/ someone there might have some suggestions for you, at the very least you could describe your problem and ask for a potential enhancement to the Java SDK if no solution exists today.

pranireddy9 commented 8 years ago

@stevenoh93 We already posted the issue yesterday. We didn't get any reply.. https://developer.ibm.com/answers/questions/278017/ranker-status-failed-using-java/

poonamsaini17 commented 8 years ago

@areddy7021 Yes , if you do not want to use python , then you have to write that piece of code in java . and it will work.

poonamsaini17 commented 8 years ago

@pranireddy9 have you used Train.py ?

areddy7021 commented 8 years ago

@poonamsaini17 i am just trying to invoke the python from java since i am not good at python ..i am not understanding that piece where actually parsing the csv file and making the train file. But if you know are if you came across any stuff ..please share here ..that would help.

pranireddy9 commented 8 years ago

No I have not used train.py in my java code.. you said if you do not want to use pyhton then you need to write piece of code in java. Did you write that piece of code for parsing. If yes please share..

areddy7021 commented 8 years ago

@stevenoh93 and @poonamsaini17 : can any of you share the code if you have that it gets invoked through java.

areddy7021 commented 8 years ago

@stevenoh93 where is t his class PipelieDriver.java ? please pass on to me.

pranireddy9 commented 8 years ago

@stevenoh93 as i have started writing the java version , i am facing a trouble here with the cranfield data.the cranfield data csv contains no headers and random number of coloumns for every csv record. I am using apache csv library to parse the csv file and to get the each coloumn data to construct the curl command for every csv record and then i can issue that curl command to watson to get me the results.

i am stuck at the position where i can not parse the cranfield gt csv file because of the file structure.but if i change the file structure but tomorrow the file structure can be changed and our java version should support all the changes. currently python is supporting very nice with the same csv file , i want to do the same thing in the java where it should support the same csv file to get the trained data.

please throw up some snippets or any code what you have or did in the past to generate the train data using any java class.