TALP-UPC / FreeLing

FreeLing project source code
Other
251 stars 96 forks source link

locutions or locucions #70

Closed joancf closed 6 years ago

joancf commented 6 years ago

Some languages (like Catalan, Spanish or English) have a "locucions.dat" file, and the examples use that name. While in German (not sure if other languages too) has a "locutions" file instead, which makes difficult a general solution having the language as parameter.

lluisp commented 6 years ago

The name of the file is irrelevant. You can set the file name you want, as long as you use the same filename in the config file (e.g. de.cfg) or when creating a class instance (if you are accessing FreeLing via API)

joancf commented 6 years ago

yes, i can set up the filenames manually, but the problem is that i can't use a general call to instantiate the maco for any language like

` MacoOptions op = new MacoOptions(lang);

    op.setDataFiles("", DATA + "common/punct.dat", DATA + lang + "/dicc.src",
            DATA + lang + "/afixos.dat", "", DATA + lang + "/locucions.dat",
            (Files.exists(Paths.get(DATA + lang + "/np.dat"))? DATA + lang + "/np.dat":""), 
            (Files.exists(Paths.get(DATA + lang + "/quantities.dat"))? DATA + lang + "/quantities.dat":""),
            DATA + lang + "/probabilitats.dat");

    Maco mf = new Maco(op);`

because the name of the files depend on the language.

Then it would be great to be able to use the cfg files. Is there an utility to use and load them in Java? because the examples provided don't use it.

thanks

lluisp commented 6 years ago

The files and options you load and activate depend on what is your application, and which options you want to use. I don't think there is a universal configuration that will work for all languages (e.g. some languages do not have multiwords file, so it makes no sense to try to load it) So, the best way to handle this would be creating your own config files, with the required options for your application.

However, if you just want a generic wrapper for default FreeLing config files, I guess you can load them easily. In C++ they are handled via boost::program_options, I guess there are similar libraries that can load config files in Java. In freeling code, this is wrapped in class "config" in src/sample_analyzer/config.h

joancf commented 6 years ago

ok, then I just use the java properties utilities. I got some problems when using it:

and gl and ast just crash cy gives an error DICTIONARY: Tag not found for contraction component. Check dictionary entries for 'sydd' and 'a'

The other languages work well...

lluisp commented 6 years ago

nb, inside the nb.cfg folder it points to no/ files, so "no" should be changed to "nb"

Fixed

there is no cfg file for hr

There is no full pipeline for hr, only isolated modules (WSD and parsing) So to use it you'd need to call some third party tagger beforehand.
I would just ignore it.

and gl and ast just crash

Can you be more specific? They crash when loading or when executing? With which input? Do they also crash when called from Analyzer.java example or from "analyze" script?

cy gives an error DICTIONARY: Tag not found for contraction component. Check dictionary entries for 'sydd' and 'a'

fixed.

thanks!

joancf commented 6 years ago

Lluís, my project is public, and contains the test files

https://github.com/TalnUPF/OpenMinted_Freeling/tree/master/input so I tried it with the corresponding files It generates a core dump.. which I don't have the tools to read inside the docker. Maybe you can check it with a "regular" Freeling instal-lation,

lluisp commented 6 years ago

Ok, I found out a couple of things:

I commented out the file names. Maybe that will help.

joancf commented 6 years ago

For gl i have this error Problematic frame:

C [libstdc++.so.6+0x135958] std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator >::basic_string(std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const&)+0x8

# Sorry Github does not allow to load the core dump (570Mb) I run it with 950Mb of memory.. is it enough? the other parsers have no problems (I even can load several of them)

lluisp commented 6 years ago

That looks more as an error in some config file... Are you trying to activate a module with a wrong or unexisting config file?

Which parser are you trying to use ? (note that "treeler" parser is not available for "gl")

Do you have a stack trace to get some hint about where it crashed?

joancf commented 6 years ago

I have it configured to use Txala ` private static Set TreelerLangs = new HashSet(Arrays.asList("ca","de","en","es","pt","sl"));

private static Set<String> TxalaLangs = new HashSet<String>(Arrays.asList("as","ca","en","es","gl"));`

the stack trace is as follows (from java runtime, lines starting with J or C indicating the language ` Stack: [0x00007fecc194d000,0x00007fecc1a4e000], sp=0x00007fecc1a4bd38, free space=1019k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libstdc++.so.6+0x135958] std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator >::basic_string(std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const&)+0x8 C [libfreeling.so+0x43f22c] _ZN9gnu_cxx13new_allocatorISt13_Rb_tree_nodeISt4pairIKNSt7cxx1112basic_stringIwSt11char_traitsIwESaIwEEEN8freeling22tree_preorder_iteratorINSA_4nodeEEEEEE9constructISE_JRKSE_EEEvPTDpOT0+0x4c C [libfreeling.so+0x43e6f8] _ZNSt16allocator_traitsISaISt13_Rb_tree_nodeISt4pairIKNSt7cxx1112basic_stringIwSt11char_traitsIwESaIwEEEN8freeling22tree_preorder_iteratorINS9_4nodeEEEEEEE9constructISD_JRKSD_EEEvRSF_PTDpOT0+0x36 C [libfreeling.so+0x43da71] _ZNSt8_Rb_treeINSt7cxx1112basic_stringIwSt11char_traitsIwESaIwEEESt4pairIKS5_N8freeling22tree_preorder_iteratorINS8_4nodeEEEESt10_Select1stISC_ESt4lessIS5_ESaISC_EE17_M_construct_nodeIIRKSC_EEEvPSt13_Rb_tree_nodeISCEDpOT+0x6d C [libfreeling.so+0x43daf2] _ZNSt8_Rb_treeINSt7cxx1112basic_stringIwSt11char_traitsIwESaIwEEESt4pairIKS5_N8freeling22tree_preorder_iteratorINS8_4nodeEEEESt10_Select1stISC_ESt4lessIS5_ESaISC_EE14_M_create_nodeIJRKSC_EEEPSt13_Rb_tree_nodeISCEDpOT+0x42 C [libfreeling.so+0x44005f] std::_Rb_tree_node<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > std::_Rb_tree<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator >, std::_Select1st<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > >, std::less<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > >::_Alloc_node::operator()<std::pair<std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > const&>(std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > const&&&) const+0x31 C [libfreeling.so+0x440000] std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > std::_Rb_tree<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator >, std::_Select1st<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > >, std::less<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > >::_M_clone_node<std::_Rb_tree<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator >, std::_Select1st<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > >, std::less<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > >::_Alloc_node>(std::_Rb_tree_node<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > const*, std::_Rb_tree<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<wchar_t, std::char_traits<wcha C [libfreeling.so+0x43fe64] std::_Rb_tree_node<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > std::_Rb_tree<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator >, std::_Select1st<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > >, std::less<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > >::_M_copy<std::_Rb_tree<std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator >, std::_Select1st<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > >, std::less<std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > >::_Alloc_node>(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > const, std::_Rb_tree_node<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > C [libfreeling.so+0x43fbea] std::_Rb_tree<std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator >, std::_Select1st<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > >, std::less<std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > >::_M_copy(std::_Rb_tree_node<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > const*, std::_Rb_tree_node<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > >*)+0x4e C [libfreeling.so+0x43f85c] std::_Rb_tree<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator >, std::_Select1st<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > >, std::less<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > >::_Rb_tree(std::_Rb_tree<std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator >, std::_Select1st<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > >, std::less<std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > > const&)+0xbc C [libfreeling.so+0x43f505] std::map<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator >, freeling::tree_preorder_iterator, std::less<std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > >::map(std::map<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator >, freeling::tree_preorder_iterator, std::less<std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<wchar_t, std::char_traits, std::allocator > const, freeling::tree_preorder_iterator > > > const&)+0x23 C [libfreeling.so+0x43ee7f] freeling::parse_tree::parse_tree(freeling::parse_tree const&)+0x3f C [libfreeling.so+0x5711b8] freeling::dep_txala::complete_parse_tree(freeling::sentence&) const+0x12a C [libfreeling.so+0x571447] freeling::dep_txala::analyze(freeling::sentence&) const+0x33 C [libfreeling.so+0x450bd4] freeling::processor::analyze(std::cxx11::list<freeling::sentence, std::allocator >&) const+0x8c C [libJfreeling.so+0x10b4a7] Java_edu_upc_Jfreeling_JfreelingJNI_DepTxala_1analyze_1_1SWIG_11+0x85 j edu.upc.Jfreeling.JfreelingJNI.DepTxala_analyzeSWIG_1(JLedu/upc/Jfreeling/DepTxala;JLedu/upc/Jfreeling/ListSentence;)V+0 j edu.upc.Jfreeling.DepTxala.analyze(Ledu/upc/Jfreeling/ListSentence;)V+10 j edu.upf.taln.uima.freeling.FreeLingWrapper.process(Lorg/apache/uima/jcas/JCas;Ljava/lang/String;I)V+258 j de.tudarmstadt.ukp.dkpro.core.api.segmentation.SegmenterBase.process(Lorg/apache/uima/jcas/JCas;)V+402 j org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(Lorg/apache/uima/cas/AbstractCas;)V+12 j org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(Lorg/apache/uima/cas/CAS;)V+244 j org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(Lorg/apache/uima/cas/CAS;)Lorg/apache/uima/analysis_engine/CasIterator;+6 j org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas()Lorg/apache/uima/cas/CAS;+275 j org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(Lorg/apache/uima/analysis_engine/asb/impl/ASB_impl;Lorg/apache/uima/cas/CAS;)V+94 j org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(Lorg/apache/uima/cas/CAS;)Lorg/apache/uima/analysis_engine/CasIterator;+6 j

org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(Lorg/apache/uima/cas/CAS;)Lorg/apache/uima/analysis_engine/CasIterator;+42 j org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(Lorg/apache/uima/cas/CAS;)Lorg/apache/uima/util/ProcessTrace;+2 j org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(Lorg/apache/uima/collection/CollectionReaderDescription;[Lorg/apache/uima/analysis_engine/AnalysisEngineDescription;)V+98 j edu.upf.taln.uima.freeling.FreelingXMIReaderWriter.main([Ljava/lang/String;)V+383 v ~StubRoutines::call_stub V [libjvm.so+0x695ae6] JavaCalls::call_helper(JavaValue, methodHandle, JavaCallArguments, Thread)+0x1056 V [libjvm.so+0x6d7072] jni_invokestatic(JNIEnv, JavaValue, _jobject, JNICallType, _jmethodID, JNI_ArgumentPusher, Thread)+0x362 V [libjvm.so+0x6f38da] jni_CallStaticVoidMethod+0x17a C [libjli.so+0x80ff] JavaMain+0x81f C [libpthread.so.0+0x76ba] start_thread+0xca ` the core even compressed is 35Mb above what github allows me to upload You can reply the docker from the dokerfile.. i'm trying to upload the docker image on docker.hub but i can't build it there... it takes too long and they stop the process

lluisp commented 6 years ago

I don't need a core dump, wouldn't know where to start looking.

Maybe you should try to run it "bare" (e.g. in a simple java program) and then add layers (uima, docker, etc) progressively

joancf commented 6 years ago

wow, solved I was not calling the chunker before the Txala parser :-(