GoogleCloudPlatform / node-red-contrib-google-cloud

Node-RED nodes for Google Cloud Platform
Apache License 2.0
90 stars 57 forks source link

language-sentiment to other languagues #36

Open marianacastan opened 4 years ago

marianacastan commented 4 years ago

Hello again!

I'm here to ask for the same enhancement you did past weeks on speech node, so I can use the language-sentiment node in other language as I could not change de default en-US.

Thank you again!

kolban-google commented 4 years ago

Will work on it ASAP ... MANY thanks for the feedback. Shouldn't take long.

kolban-google commented 4 years ago

Coding has been completed but not yet published. There will be another post when publish occurs and it is ready for testing.

kolban-google commented 4 years ago

The changes have been made and a new build of the node-RED package published. The version that contains the changes will be 0.0.8 and beyond. If I could ask you to refresh your nodes and test with your desired language and post back that would be great. I'll leave this issue open until I hear back from you. Fingers crossed I got it right ... but if not, post back and with any issues or questions and we'll keep working on it till you are delighted.

marianacastan commented 4 years ago

Hi! Just updated and now there's a field to choose de language. I made some tests with "pt" (portuguese) but the score doesn't seem right, I am typping very negative stuff and I can't make it return negative. As pt is a supported language for sentiment analysis, I don't know if maybe it's a problem at the node or not.

kolban-google commented 4 years ago

Howdy Mariana. Many thanks for letting me know. What we need to do is create a base line. I'm a dumb English speaker and that is the only language I know. I have been testing with the following English phrases:

Postive: "I am very pleased with the service. This is an excellent product!" Negative: "I am not happy. The product did not meet my expectations."

Can you provide me similar phrases in Portuguese?

Next, for each of these two phrases, visit this web site:

https://cloud.google.com/natural-language/

There you will find a sample where you can enter the text and have it analyzed. We will then see the score and magnitude. For each of the phrases, record the two numbers.

At the end of this, we will have:

We will call this our baseline. This is what we hope/expect the Node-RED node to return. From here, I'll test that the node-RED node does return those values and ... if not ... then it is back to me to fix and apologize for getting it wrong.

Post back with the phrases and what you see as scores and I'll get right on it.

marianacastan commented 4 years ago

For some reason at the website it just worked with "Entities" and "Syntax" of the phrases, but at both sentences (negative and positive) at "Sentiment" it returns "The Natural Language API does not support sentiment analysis for the detected language."

image

\---------------------------------------------

image

Despite at documentation it shows "es" and "pt" as supported, both languages don't seem to work :/

image

kolban-google commented 4 years ago

Hmmm ... I recreated the issue on the web page by entering:

I am very pleased with the service. This is an excellent product! Estou muito satisfeito com o serviço. Este e um excelente produto!

I am not happy. The product did not meet my expectations. Não estou feliz. O produto não atendeu às minhas expectativas.

... and also see that sentiment claims not to be supported. I have raised an issue with Google which they are tracking as b/142652548. I'll post back when I hear more about that. I'm also going to plugin these "pt" phrases and see what I see in terms of sentiment response from the Node-RED nodes.

kolban-google commented 4 years ago

I tried some more experiments. I tried some German phrases and the positive was highly positive and the negative was highly negative. I then tried switching the German (language = de) to Portuguese (language = pt) and got EXACTLY the same results. It was as though the language parameter was being ignored and the determination of the source language was being deduced. I double checked that the parameter I set was being passed in ... I am making this API call:

https://googleapis.dev/nodejs/language/latest/v1.LanguageServiceClient.html#analyzeSentiment

and passing in the language in the document object:

https://googleapis.dev/nodejs/language/latest/google.cloud.language.v1.html#.Document

The documentation seems to say that the language will be "deduced" if not explicitly supplied. I'm now not sure at all what it means to specify the language of the document. It may be an optimization or hint and can actually be ignored. My hope is that we will get the web page going and can then enter our phrases there and see what we get back as a "base line". Obviously if the sentiment analysis API returns poor results, Node-RED can't do better than that. However we will wait and see. If the results are poor, I'll raise a defect with Google on that too.

One other test we can try is to look for another sample (not Node-RED nor web page) that runs a sentiment analysis using GCP and see if we (through another technique) get a base-line sentiment number.

marianacastan commented 4 years ago

I'm also running other tests, translating the content to "en-US" and then getting the sentiment analysis. It's not the best way but by now may help, so I'm trying to figure out if this option will bring me a satisfactory result.

As well, I want to thank you for for your proactive help and support! Glad to have people like you in the community!

kolban-google commented 4 years ago

Thanks for the kind words. Yuk!! Translating "pt" to "en-US" and then doing sentiment analysis on "en-US" text? I'm hoping we can avoid all of that. How are you translating from "pt" to "en-US"? That is a new node I'm working on ... https://github.com/GoogleCloudPlatform/node-red-contrib-google-cloud/issues/34 ... but its not released yet....?

marianacastan commented 4 years ago

Yeah, I am translating the "pt_BR" text to "en-US" and then passing it as the msg.payload with the language set in "en". As I couldn't find any working node to translate via Google, I'm using Watson. The results are acceptable for a demo but it's raising my runtime, which is no good for me.

kolban-google commented 4 years ago

While I'm still hoping your translation is a workaround and we'll be able to determine sentiment directly without translating to English, I do have a new goody for you. I just pushed version 0.0.9 and this adds another Node-RED node. This one is translation. The idea here is that with this node, you can use Google's language translation service to translate from pt-BR to en-US.

While I agree our story so far is indeed "ok" for a demo, I fully agree that we need to resolve the problem fully and properly. I'm going to keep tracking the bigger story and we'll get there. I'll send some more emails now to push it along.

kolban-google commented 4 years ago

I did some more tests tonight. I wrote a stand-alone JavaScript app that looks as follows:

/* jshint esversion: 8 */

async function run() {
    const language = require("@google-cloud/language");
    const client = new language.LanguageServiceClient();
    //text = "This product is great!";
    //text = "This product is terrible!";
    //text = "Estou muito satisfeito com o serviço. Este e um excelente produto!";
    //text = "Não estou feliz. O produto não atendeu às minhas expectativas.";
    //text = "A qualidade é muito ruim.";
    text = " A qualidade é muito boa.";
    const document = {
        "content": text,
        "type": "PLAIN_TEXT",
        "language": "pt"
    };
    const [result] = await client.analyzeSentiment({document});
    const sentiment = result.documentSentiment;
    console.log(sentiment);
}

run();

I then ran it from the command line changing the text. I then used the Node-RED nodes and got exactly the same answers. This increases my confidence that the numbers that our Node-RED node is producing are the numbers we would expect from calling the sentiment API.

For the two "pt" phrases:

Positive: A qualidade é muito boa. - 0.899... Negative: A qualidade é muito ruim. - -0.600...

To be clear, I don't speak any Portuguese and used Google Translate to build what I think are simple phrases ... but these tests "seem" to be saying that all is working as desired.

Let us now back up a little ... can you provide me with some sample phrases in Portuguese (and English too if you could) as well as the score you are actually seeing. In addition, if you can provide some thinking on what the score is you might be anticipating?

marianacastan commented 4 years ago

Some phrases I used that returned a positive score, althougt the sentiment was negative

en/pt I hate you / Eu te odeio = -0.6 (en score) / 0.2 (pt score) You disappoint me all the time / Você me desaponta o tempo todo = -0.1 (en score) / 0.2 (pt score) I don't wanna live with you / Eu não quero morar com você = -0.3 (en score) / 0.1 (pt score) You're horrible / Você é horrivel = -0.8 (en score) / 0.1 (pt score)

I really think that the problem might be the laguage processing and not the node.

kolban-google commented 4 years ago

I can solidly confirm that the scores that are being seen today are indeed the language processing rather than the mapping that is occurring through Node-RED. As such, there is nothing I can do to improve it from a Node-RED nodes perspective. I have seen good news on the language processing story as well but can't share that at this time. Bottom line is assume that you won't have to translate to English in the future (I think we both agree that is a poor circumvention). Stay tuned.

marianacastan commented 4 years ago

Sure! Thank you a lot! I’ll stay tuned!