UnitexGramLab / unitex-core

Unitex/GramLab C++ Core
https://unitexgramlab.org
Other
22 stars 24 forks source link

Fst2List limit on number of paths does not prevent complete exploration #2

Closed martinec closed 8 years ago

martinec commented 8 years ago

Submitted by @eric-laporte

Fst2List lists the paths of a graph. When using it, the user can set a limit on the number of paths to be listed. But the computation takes the same time whatever the limit. The limit does not seem to prevent a complete exploration of the paths.

What steps will reproduce the problem?

In the experiment recorded in the attached log (Unitex3.1beta, revision 23 November 2015), I set the limit to 10 paths and apply Fst2List to a graph. The output is displayed after 6 s. Then, I set the limit to 100 000 paths and apply Fst2List to the same graph. The 100 000 paths are displayed after 6 s too.

What is the expected output?

In the case of a grammar with many paths or infinitely many paths, the limit on the number of paths could be a way of obtaining an incomplete list of paths without waiting for a hypothetical output. Would it be possible to stop the path exploration once the number of paths reaches the limit?

More info

hghwng commented 8 years ago

After tracing the execution very carefully, I found the cause of slow exploration.

For most modes, getWordsFromGraph() calls exploirerSubAuto() twice. The first call explores all possible states to find loops. The second call actually writes to the file.

The first call is the cause of slow exploration. How can we solve it?

Pull request #7 disables the loop check. Merge with caution.

hghwng commented 8 years ago

At present I don't find it crucial to users to find loops — it's somehow a helper for debugging. Easy debugging with extremely slow execution, or less debugging info with much faster execution, I guess the latter is what we need the most.

What't your opinion?

eric-laporte commented 8 years ago

Yes, less debugging info with much faster execution is what we need. Thanks,

Eric

Le 20/04/2016 07:36, Hugh Wang a écrit :

At present I don't find it crucial to users to find loops — it's somehow a helper for debugging. Easy debugging with extremely slow execution, or less debugging info with much faster execution, I guess the latter is what we need the most.

What't your opinion?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/UnitexGramLab/unitex-core/issues/2#issuecomment-212267017

ÉRIC LAPORTE Professeur LIGM

T : +33 1 60 95 75 52 BÂTIMENT COPERNIC - BUREAU 4B090 eric.laporte@univ-paris-est.fr http://igm.univ-mlv.fr/~laporte/index_en.html http://igm.univ-mlv.fr/%7Elaporte/index_en.html

UPEM - Université Paris-Est Marne-la-Vallée CITÉ DESCARTES - 5 BOULEVARD DESCARTES - CHAMPS-SUR-MARNE 77454 MARNE-LA-VALLÉE CEDEX 2 - WWW.U-PEM.FR http://www.u-pem.fr


L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus