mars203030 commented 7 months ago

Hi

can I use this metric from cross lingual evaluation?

abhik1505040 commented 7 months ago

Hi @mars203030,

No, ROUGE requires the reference and prediction text to be in the same language. For cross-lingual evaluation, you can look at this metric.

mars203030 commented 7 months ago

thank you soo much I will try it

but for your rouge library I have this error , another question which arabic stemmer do I need to install

` ImportError Traceback (most recent call last) Cell In[15], line 3 1 import sys 2 sys.path.append('/multilingual_rouge_scoring') ----> 3 from multilingual_rouge_scoring import rouge_scorer 6 scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True, lang="arabic") 8 scores = scorer.get_scores(conversation_ar, ar_note)

File ~/Downloads/Visualization/NeuroNLP/attempt3/multilingual_rouge_scoring/rouge_scorer.py:37 35 from six.moves import range 36 from rouge_score import scoring ---> 37 from rouge_score import tokenization_wrapper as tokenize 38 import pyonmttok 39 import collections

ImportError: cannot import name 'tokenization_wrapper' from 'rouge_score' `

abhik1505040 commented 7 months ago

1 import sys
2 sys.path.append('/multilingual_rouge_scoring')
----> 3 from multilingual_rouge_scoring import rouge_scorer
6 scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True, lang="arabic")
8 scores = scorer.get_scores(conversation_ar, ar_note)

This is not how you are supposed to use this library. First, install the package with the instructions given here. Then follow these examples on how to use this package with python.

another question which arabic stemmer do I need to install

This repo uses the NLTK SnowballStemmer module for Arabic. It'll be installed automatically when you install our package.

mars203030 commented 7 months ago

I am still having an issue here

here is my installation

and this is the code an the error

abhik1505040 commented 7 months ago

Your notebook isn't running in the same environment where you installed the package. I've replicated the correct workflow in this colab notebook. Please follow this.

mars203030 commented 7 months ago

Thanks , I restarted the kernal and it is working fine I have questions regrding LASE it is working I am comparing english text(reference) and arabic text (predicted) 1) I would like to know what is a good lase score what it the range below is my result for one input ' from LaSE import LaSEScorer scorer = LaSEScorer()

score = scorer.score( clinical_note, conversation_ar,

language name of the reference text

)

print(score)'

LaSEResult(ms=0.6220683, lc=1.0, lp=1.0, LaSE=0.6220682859420776)

2)if I define the target_lang I receive this error ValueError: predict processes one line at a time (remove '\n') 3) is there a max length my generated text is around 4000 word

4) for the rouge score also is there a max length? 5) there is a minimal difference in the results for the english text summarization when I use the original google package and the multilingual package . how is the difference is explained: google : {'rouge1': Score(precision=0.37389380530973454, recall=0.49852507374631266, fmeasure=0.42730720606826805), 'rouge2': Score(precision=0.07982261640798226, recall=0.10650887573964497, fmeasure=0.09125475285171102), 'rougeL': Score(precision=0.17699115044247787, recall=0.2359882005899705, fmeasure=0.202275600505689), 'rougeLsum': Score(precision=0.32964601769911506, recall=0.4421364985163205, fmeasure=0.37769328263624846)}

MLRouge: English : {'rouge1': Score(precision=0.3893805309734513, recall=0.5191740412979351, fmeasure=0.44500632111251587), 'rouge2': Score(precision=0.08869179600886919, recall=0.11834319526627218, fmeasure=0.10139416983523449), 'rougeL': Score(precision=0.18584070796460178, recall=0.24778761061946902, fmeasure=0.21238938053097345), 'rougeLsum': Score(precision=0.33849557522123896, recall=0.4540059347181009, fmeasure=0.3878326996197719)}

Regards

abhik1505040 commented 7 months ago

I would like to know what is a good lase score what it the range

The value range for LaSE is [0, 1]. In general, we found good summaries to have LaSE score > 0.5.

if I define the target_lang I receive this error ValueError: predict processes one line at a time (remove '\n')

The target evaluation domain of this metric was short, single-line summaries. Therefore, as indicated by the error, you'd need to make sure your reference and prediction texts don't contain new lines.

is there a max length my generated text is around 4000 word

The embedding model behind LaSE, namely LaBSE, only supports sequences up to 512 tokens.

for the rouge score also is there a max length?

No.

there is a minimal difference in the results for the english text summarization when I use the original google package and the multilingual package . how is the difference is explained:

The difference is in the tokenization, stemming and character filtering policies. For example, the google package removes all non-alphanumeric characters and applies stemming when token length exceeds a threshold, which we don't do to enable multilingual evaluation. Please see both implementations to get a better idea of all the differences.

mars203030 commented 6 months ago

Thank you very much for your generous reply and patience.

csebuetnlp / xl-sum

cross lingual #17

language name of the reference text