Closed sudeephazra closed 4 years ago
@sudeephazra Thanks for the question! We are investigating and will update you shortly.
@sudeephazra In the last step (Use Microsoft Azure Storage Explorer to view the weather forecast)
It has you download a .csv file from Azure Storage Explorer. In this file, do you consistently see a value of 0.489944100379944? It might be that your solution is working but the 'chance of rain' hasn't fluctuated. Can you verify this? Thanks, Mike
@sudeephazra We will now proceed to close this thread. If there are further questions regarding this matter, please reopen it and we will gladly continue the discussion.
Regards, Mike
Hi Mike, I have managed to follow the tutorial without any problems, but i also have the same issue with the prediction returning 0.489944100379944. I have tested this with the inbuilt default endpoint test button and the following came back to me at the bottom of the screen - 'Weather prediction model [Predictive Exp.]' test returned ["21","44","NO","0.392127901315689"]... Any ideas why it is not providing the correct prediction through Azure? Thanks Kevin
@sullyNivek75 At the beginning of this tutorial, there is a 'Note' section that discusses a prerequisite for having the device already set-up:
In those steps, there is the option to use simulated data or not:
If you don't have the sensor, set the simulatedData value to true to make the sample application create and use simulated sensor data.
Is your sensor hardware configured to use simulated or sensor data?
Hi Mike,
Yes my device is set up and is reporting sensor data to Azure>IoT Hub>Streaming Analytics>Power Bi. But I am also having the same problem when i have added an additional query to the Streaming Analytics to use the machine learning the prediction is reporting 0.489944100379944 , through blob storage and through Power Bi. Not sure where the problem is?
Thanks
Kevin
I resolved the issue. The problem is with the training data that the ML model uses. In the training data CSV, the temperature and humidity columns have the character 'M' in a bunch of places. This causes the model to expect characters as values of temperature and humidity. But if you send in characters to the model, it is unable to perform mathematical/statistical calculations. I downloaded the CSV, removed all records with the letter "M" in the temperature and humidity columns, uploaded the CSV and used this file as the training data and had my eureka moment :) The entire flow works now.
Regret not being able to respond on this earlier.
I too changed the training data set after i removed the 'M' character from the humidity column and it is now working!
Thanks for you help sudeephazra
Mike, i would suggest you cleanse the online dataset.
kevin
Reference:
The issue is not with the .xlsm/.xlsx prediction model template but with the resulting .csv written to blob storage, I believe. I am using the sample data links from within the Excel workbook. I do not have access to IoT hardware to walk through this tutorial in it's entirety. I will work with the content owner to figure out what is causing this issue. @sudeephazra @sullyNivek75 Thanks your feedback and if you have a screen capture that clearly demonstrates the issue, that would be very helpful.
Regards, Mike
Hi Mike,
You are right, there is still a problem with the algorithm. I let this run through the weekend to see if it would give more accurate readings. What i am finding now is the result is fluctuating, but only by 2% - ranging from 7-9%. This was the case during friday night when it was raining for approx. 8hrs. I would be happy to work with the creators to try and enhance the outputs to make it more accurate. Need to understand the inner mechanics of how the whole process works.
Let me know if there is anything i can do to help?
Kevin
@sergaz-msft The weather data that is being written to the .csv file, by the sensor, is picking-up random characters (^M) that is breaking parts of this tutorial. Is there a contact we can leverage to have this addressed?
I have attached a screen grab to show the outputs from the tests that i have been running. Clearly something wrong. With 100% humidity and a temperature of 10 degrees C, there is a better chance than 9% that it could rain??
Any help would be good
Kevin
Incidentally I have put the original dataset back into the process and the following happens in the same excel predictive session:
The percentages appear to be more realistic based on the humidity levels and the temps. I would say, the model and the outputs are not an issue. Could be an issue with Azure maybe or Power Bi? Not understanding why the outputs to excel look fine with the original dataset but not through Azure or PBi.
Kevin
Hi - I am getting an error on the SQL code under stream analytics - "User defined function calls must start with "udf." prefix. When i change the code to udf. machinelearning(temperature, humidity) I get Probability as 0% .
Reassigning this to me so I can take a look at it and see if I can get this issue resolved.
robinsh Reassigning is not a valid GitHub ID, or is not a collaborator on this repo.
@robinsh checking into see if there is any update on this issue.
Jimaco -- can you take a look at this?
@AshokPeddakotla-MSFT I'm still in the middle of something; turning it over to Jimaco to take a look at.
Hi there -- I finally had a chance to look at this in some depth. Here's what I can say about it:
If you use the original training data (keeping the rows with 'M" as a value for temp and/or humidity), the model seems to work OK as long as you supply a whole number for the humidity value. If you supply a number that has a fraction, you will always wind up with a prediction of 0.489944100379944. I don't know why that is, but the solution is to either round the data you supply to the machine learning function or clean the rows from the training data that have these values in them.
I wrote an R script to remove all rows from the data frame that have temperature or humidity values that cannot be converted to a number. Here's the script:
# Map 1-based optional input ports to variables
data <- maml.mapInputPort(1) # class: data.frame
data$temperature <- as.numeric(as.character(data$temperature))
data$humidity <- as.numeric(as.character(data$humidity))
completedata <- data[complete.cases(data), ]
maml.mapOutputPort('completedata')
I inserted it just after the Clean Data module in the Training Experiment:
This got rid of the rows with the M's in the training data. It also has the side effect of making the temperature and humidity data numeric rather than string. This is nice because when you "visualize" subsequent data sets, you can click on a column and get great stats like max, min, mean, medium and standard deviations. Finally, I'm a little out of my expertise here -- there may be better ways of doing this. :)
I don't think it's wise to expect this experiment to be overly predictive of rain. I'm not a weather guy, but I would guess you need a lot more data than just temperature and humidity -- i dunno ... barometric pressure for example? :). If you think about it, 30 degress and 97 percent humidity might mean completely different things on a summer day in New York and a day during the typhoon season in Taiwan.
This is confirmed by "visualizing" the data set coming out of the second port of the split data module (the data being fed into the score model module). Notice that roughly 60% of the data points have rain:
Then "visualize" the data set coming out of the score model module. Notice that well over 90% are scored as having a probability of rain:
I'm no data scientist, but I'm guessing that means that our predictive model is not overly robust :).
@MieRobot With the above provisions in place, the scenario worked OK for me. I didn't experience the issue you reported. If you want to pursue it further,can you provide additional information?
I'll work with the author of the training experiment to see whether they can add a module to clean the data frame as I've done.
Finally, this is a pretty old issue. Is this still a problem for anyone else on the thread? If I don't hear back in the next week or so, I'll go ahead and close it.
Thanks,
Jimaco
The topic has been updated to add the additional r script module to clean the temp and humidity columns. It's live now. Since I haven't heard back on this thread, I'm going to close it. Please @-comment me here or open another issue if you still have problems. Thanks!
Jimaco
Great tutorial and very easy to follow. I got everything working as per this guide and have a working solution. The only problem is that my prediction is always 0.489944100379944. What am I missing?
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.