Question on that the program is hung up and result in no learned model

wqqqy commented 4 years ago

Hello, everyone. I am newcomer in the field of Automata Learning. When I run my program. I find my program is hung up, I got 29 hypothesis but no learned model. Actually the 29 hypothesis are the same. There is the result of my program and I had forced stop it.

This is the stopped program.

This is the result.

This is the log of Learner.

Following is my main code, and I used L* algorithm and wmethod algorithm.

LearnLogger log = LearnLogger.getLogger(MQTTLearner.class.getSimpleName());   
        log.info( "Using learning algorithm " + learningAlgorithm.getClass().getSimpleName());
        log.info("Using equivalence algorithm " + equivalenceAlgorithm.getClass().getSimpleName());
        log.info("Start Learning");

        SimpleProfiler.start("Total time");

        boolean learning = true;
        Counter round = new Counter("Round","");
        round.increment();
        log.logPhase("Starting round" + round.getCount());
        SimpleProfiler.start("Learning");
        learningAlgorithm.startLearning();
        SimpleProfiler.stop("Learning");

        MealyMachine<?, String, ?, String> hypothesis = learningAlgorithm.getHypothesisModel();

        while(learning) {
            //保存输出的内容
            writeDotModel(hypothesis, alphabet, config.output_dir + "/hypothesis_" + round.getCount() + ".dot");

            //查找是否有反例
            SimpleProfiler.start("Searching for counter-example");
            DefaultQuery<String, Word<String>> counterExample = equivalenceAlgorithm.findCounterExample(hypothesis, alphabet);
            SimpleProfiler.start("Searching for counter-example");

            if(counterExample == null) {
                //没有找到反例，停止学习
                learning = false;
                //输出
                writeDotModel(hypothesis, alphabet, config.output_dir + "/learnedModel.dot");
                //writeAutModel(hypothesis, alphabet, config.output_dir + "/learnedModel.aut");
            }else {
                //找到反例，更新假设并且继续学习
                log.logCounterexample("Counter-example found" + counterExample.toString());
                round.increment();
                log.logPhase("Starting round " + round.getCount());

                SimpleProfiler.start("Learning");
                learningAlgorithm.refineHypothesis(counterExample);
                SimpleProfiler.stop("Learning");

                hypothesis = learningAlgorithm.getHypothesisModel();
            }
        }

        SimpleProfiler.stop("Total time");
        // Output statistics
        log.info( "-------------------------------------------------------");
        log.info( SimpleProfiler.getResults());
        log.info( round.getSummary());
        log.info(statesMemOracle.getStatisticalData().getSummary());
        log.info( statesCachedMemOracle.getStatisticalData().getSummary());
        log.info(statesEqOracle.getStatisticalData().getSummary());
        log.info(statesCachedEqOracle.getStatisticalData().getSummary());
        log.info( "States in final hypothesis: " + hypothesis.size());

I would be greatly thankful if anybody would to offer some suggestions. Thanks！

mtf90 commented 4 years ago

Two things seem off here:

Your console output shows that there was an exception thrown when answering membership queries. Is this just logging output or could this potentially influence how queries are answered? In your SUL implementation you need to make sure, that queries are answered deterministically.
It seems you repeatedly received the same counterexample which shouldn't happen for the L* algorithm. Just to be sure: The response is CONNACK SUBACKcommand///req/# CONNACKFail CONNACKFail? So the four letter input word is correctly answered by a four letter output word?

I assume, there might be something wrong with your SUL implementation, i.e. queries to your SUL are answered nondeterministically depending on whether the connection to the MqttClient could be established or not. Most learners compare their current hypothesis output with the given counterexample to decide if the hypothesis needs refinement. So it could happen that a counterexample found by the equivalence oracle is not recognized by the learner because it is answererd differently when posed by the learner.

You could add an if statement around the call to learningAlgorithm.refineHypothesis(counterExample) (returns a boolean) to see if the counterexample actually refines the hypothesis and throw an exeception otherwise. This should give you the first instance where things go wrong and from there you can continue to debug.

wqqqy commented 4 years ago

Thanks a lot! I will try this as you suggested.

mtf90 commented 4 years ago

Any updates on this? Could you determine wether this was an issue in your SUL or this actually is a bug in LearnLib?

wqqqy commented 4 years ago

Thx for giving suggestions! I have determined this was an issue in my SUL. Because the SUL may have delay problems in handling the input message and sometimes have different response. So I was thinking if there is any mechanism to handler this situation, such as choosing the responese of the maximum probability as the output to learnlib. But I am still working on it.

mtf90 commented 4 years ago

LearnLib doesn't have any constraints on how fast queries have to be answered. So if you're experiencing synchronization errors in your SUL, you could add something like a timeout (e.g. wait for 5/10/15 seconds for a response and return "timeout" otherwise).

Based on your explanation, however, this sounds more like a user-land issue then. I'd suggest to close this ticket and you may re-open it if it is indeed a bug in LearnLib. Is that ok with you?

Also, consider our Q&A mailing list if you have questions regarding the usage of LearnLib (or AutomataLib). This way, we can use the mailing list for discussing best practices and can keep the GitHub tracker to code-related issues.

wqqqy commented 4 years ago

I have tried adding timeout and it works well in one of my SULs, but there still exits uncertainty on the response of other SULs. Hope I can solve it. Thx. It is a user-land issue and it is ok to close this ticket.

LearnLib / learnlib

Question on that the program is hung up and result in no learned model #69