MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.25k stars 21.42k forks source link

Prediction Not Changing #7851

Closed sudeephazra closed 4 years ago

sudeephazra commented 6 years ago

Great tutorial and very easy to follow. I got everything working as per this guide and have a working solution. The only problem is that my prediction is always 0.489944100379944. What am I missing?


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

mimckitt commented 6 years ago

@sudeephazra Thanks for the question! We are investigating and will update you shortly.

Mike-Ubezzi-MSFT commented 6 years ago

@sudeephazra In the last step (Use Microsoft Azure Storage Explorer to view the weather forecast)

It has you download a .csv file from Azure Storage Explorer. In this file, do you consistently see a value of 0.489944100379944? It might be that your solution is working but the 'chance of rain' hasn't fluctuated. Can you verify this? Thanks, Mike

Mike-Ubezzi-MSFT commented 6 years ago

@sudeephazra We will now proceed to close this thread. If there are further questions regarding this matter, please reopen it and we will gladly continue the discussion.

Regards, Mike

sullyNivek75 commented 6 years ago

Hi Mike, I have managed to follow the tutorial without any problems, but i also have the same issue with the prediction returning 0.489944100379944. I have tested this with the inbuilt default endpoint test button and the following came back to me at the bottom of the screen - 'Weather prediction model [Predictive Exp.]' test returned ["21","44","NO","0.392127901315689"]... Any ideas why it is not providing the correct prediction through Azure? Thanks Kevin

Mike-Ubezzi-MSFT commented 6 years ago

@sullyNivek75 At the beginning of this tutorial, there is a 'Note' section that discusses a prerequisite for having the device already set-up:

screenshot 204

set up your device

In those steps, there is the option to use simulated data or not:

If you don't have the sensor, set the simulatedData value to true to make the sample application create and use simulated sensor data.

Is your sensor hardware configured to use simulated or sensor data?

Configure the sample application

sullyNivek75 commented 6 years ago

Hi Mike,

Yes my device is set up and is reporting sensor data to Azure>IoT Hub>Streaming Analytics>Power Bi. But I am also having the same problem when i have added an additional query to the Streaming Analytics to use the machine learning the prediction is reporting 0.489944100379944 , through blob storage and through Power Bi. Not sure where the problem is?

Thanks

Kevin

sudeephazra commented 6 years ago

I resolved the issue. The problem is with the training data that the ML model uses. In the training data CSV, the temperature and humidity columns have the character 'M' in a bunch of places. This causes the model to expect characters as values of temperature and humidity. But if you send in characters to the model, it is unable to perform mathematical/statistical calculations. I downloaded the CSV, removed all records with the letter "M" in the temperature and humidity columns, uploaded the CSV and used this file as the training data and had my eureka moment :) The entire flow works now.

Regret not being able to respond on this earlier.

sullyNivek75 commented 6 years ago

I too changed the training data set after i removed the 'M' character from the humidity column and it is now working!

Thanks for you help sudeephazra

Mike, i would suggest you cleanse the online dataset.

kevin

Mike-Ubezzi-MSFT commented 6 years ago

Reference:

screenshot 215

The issue is not with the .xlsm/.xlsx prediction model template but with the resulting .csv written to blob storage, I believe. I am using the sample data links from within the Excel workbook. I do not have access to IoT hardware to walk through this tutorial in it's entirety. I will work with the content owner to figure out what is causing this issue. @sudeephazra @sullyNivek75 Thanks your feedback and if you have a screen capture that clearly demonstrates the issue, that would be very helpful.

Regards, Mike

sullyNivek75 commented 6 years ago

Hi Mike,

You are right, there is still a problem with the algorithm. I let this run through the weekend to see if it would give more accurate readings. What i am finding now is the result is fluctuating, but only by 2% - ranging from 7-9%. This was the case during friday night when it was raining for approx. 8hrs. I would be happy to work with the creators to try and enhance the outputs to make it more accurate. Need to understand the inner mechanics of how the whole process works.

Let me know if there is anything i can do to help?

Kevin

Mike-Ubezzi-MSFT commented 6 years ago

@sergaz-msft The weather data that is being written to the .csv file, by the sensor, is picking-up random characters (^M) that is breaking parts of this tutorial. Is there a contact we can leverage to have this addressed?

sullyNivek75 commented 6 years ago

image

I have attached a screen grab to show the outputs from the tests that i have been running. Clearly something wrong. With 100% humidity and a temperature of 10 degrees C, there is a better chance than 9% that it could rain??

Any help would be good

Kevin

sullyNivek75 commented 6 years ago

Incidentally I have put the original dataset back into the process and the following happens in the same excel predictive session:

image

The percentages appear to be more realistic based on the humidity levels and the temps. I would say, the model and the outputs are not an issue. Could be an issue with Azure maybe or Power Bi? Not understanding why the outputs to excel look fine with the original dataset but not through Azure or PBi.

Kevin

MieRobot commented 5 years ago

Hi - I am getting an error on the SQL code under stream analytics - "User defined function calls must start with "udf." prefix. When i change the code to udf. machinelearning(temperature, humidity) I get Probability as 0% .

robinsh commented 5 years ago

reassign:robinsh

Reassigning this to me so I can take a look at it and see if I can get this issue resolved.

PRMerger17 commented 5 years ago

robinsh Reassigning is not a valid GitHub ID, or is not a collaborator on this repo.

AshokPeddakotla-MSFT commented 5 years ago

@robinsh checking into see if there is any update on this issue.

robinsh commented 5 years ago

reassign @JimacoMS3

Jimaco -- can you take a look at this?

robinsh commented 5 years ago

@AshokPeddakotla-MSFT I'm still in the middle of something; turning it over to Jimaco to take a look at.

JimacoMS3 commented 4 years ago

Hi there -- I finally had a chance to look at this in some depth. Here's what I can say about it:

  1. If you use the original training data (keeping the rows with 'M" as a value for temp and/or humidity), the model seems to work OK as long as you supply a whole number for the humidity value. If you supply a number that has a fraction, you will always wind up with a prediction of 0.489944100379944. I don't know why that is, but the solution is to either round the data you supply to the machine learning function or clean the rows from the training data that have these values in them.

  2. I wrote an R script to remove all rows from the data frame that have temperature or humidity values that cannot be converted to a number. Here's the script:

    # Map 1-based optional input ports to variables
    data <- maml.mapInputPort(1) # class: data.frame
    
    data$temperature <- as.numeric(as.character(data$temperature))
    data$humidity <- as.numeric(as.character(data$humidity))
    
    completedata <- data[complete.cases(data), ]
    
    maml.mapOutputPort('completedata')

    I inserted it just after the Clean Data module in the Training Experiment:

    Rscript

    This got rid of the rows with the M's in the training data. It also has the side effect of making the temperature and humidity data numeric rather than string. This is nice because when you "visualize" subsequent data sets, you can click on a column and get great stats like max, min, mean, medium and standard deviations. Finally, I'm a little out of my expertise here -- there may be better ways of doing this. :)

  3. I don't think it's wise to expect this experiment to be overly predictive of rain. I'm not a weather guy, but I would guess you need a lot more data than just temperature and humidity -- i dunno ... barometric pressure for example? :). If you think about it, 30 degress and 97 percent humidity might mean completely different things on a summer day in New York and a day during the typhoon season in Taiwan.

  4. This is confirmed by "visualizing" the data set coming out of the second port of the split data module (the data being fed into the score model module). Notice that roughly 60% of the data points have rain:

    Visualize second port of split data

    Then "visualize" the data set coming out of the score model module. Notice that well over 90% are scored as having a probability of rain:

    scored data

    I'm no data scientist, but I'm guessing that means that our predictive model is not overly robust :).

  5. @MieRobot With the above provisions in place, the scenario worked OK for me. I didn't experience the issue you reported. If you want to pursue it further,can you provide additional information?

  6. I'll work with the author of the training experiment to see whether they can add a module to clean the data frame as I've done.

  7. Finally, this is a pretty old issue. Is this still a problem for anyone else on the thread? If I don't hear back in the next week or so, I'll go ahead and close it.

Thanks,

Jimaco

JimacoMS3 commented 4 years ago

The topic has been updated to add the additional r script module to clean the temp and humidity columns. It's live now. Since I haven't heard back on this thread, I'm going to close it. Please @-comment me here or open another issue if you still have problems. Thanks!

Jimaco

JimacoMS3 commented 4 years ago

please-close