Open mohsin127 opened 4 years ago
Thank you for opening this issue.
Without any further knowledge, I would be very careful of using the correlation score when the PPS is 0.37 in one direction but 0 in the other. This obviously means that there is only real predictive power in one direction and there is rather no symmetric (two-way) relation between the columns. (if the scores are valid and there was no error) What you should do is plot the data and have a look at the actual relation. Can you maybe share your data and the analysis? Then I might have a look at it.
If you need a single score for knowing what the direct two-way relation between the columns is, then PPS might not be the right score for you because it does not give you this direct interpretation and it is not suitable for this two-way, symmetric approach.
Can you maybe state why you need this directly proportional relation? What is the business use case or interpretation?
Hello Sir, Thank you for the detailed answer. Actually what I am trying to find is relation between no of words in a post to comments, likes, shares and views. So for that I need to know either they are directly proportional or inversely proportional.
What I am getting from your reply is that if ppscore from x to y is 3.7 and y to x is 0, So this mean they are directly proportional to each other? If I am wrong please correct me.
Is it possible that you share some data?
Why do you want to find the relation? What do you want to achieve? What is your overarching target? Do you want to predict the likes, shares, comments, views of posts?
Actually what I want to know if with increasing the no of words in a post weather the no of likes, comments, views etc also increase or decrease? is there any relation between them. Either they increase directly on inverse with each other.
The data is very sensitive to be shared but they are just numbers. something like views = 2000 , no of words in a post =115. around 20,000 of this type of data.
If it is possible, please share the (anonymized) data via a CSV e.g. via uploading to a Google Drive - so that I can have a look at it. If this is not possible due to confidentiality, you can reach out to 8080labs.com for consulting. Otherwise, I am afraid that I cannot help you
Hello Florian, You can check the data here. https://docs.google.com/spreadsheets/d/1bvVXJP__eHmiX7KtiPp211Slor6JGQOBt8bSF3UrPRk/edit?usp=sharing
Hi, thank you for sharing your data. I had a quick look at it. When I tried it, the PPS was always 0 which might be caused by the small amount of data. I also plotted it in various ways and also tried to use some binning and it made sense to me that the score was 0 because there hardly exists any pattern
Some observations:
Asymmetry:
Disclaimer: I would be careful with the findings because those might not be statistically relevant. In case those relationships would be relevant, the PPS could have found them. Since the PPS did not find them, this indicates that the relationships are not strong enough given cross-validation. However, on your bigger datasets, the PPS seemed to be bigger than 0, so the patterns might be valid.
Summing up, there is hardly any relation to find here and it is definitely not what you hoped for (with more words there are more views or inversely). The scenario and data is so complex, that I cannot answer you more questions about it for free. If you want you can try to hire 8080 Labs for consulting but I am afraid that this is out of scope for discussions in the context of ppscore
I have a simple question regarding ppscore. When I was calculating the correlation between two datasets(Columns) the result was was -0.248, which means when one when data increases the other will decreases but when I calculated the ppscore of the same columns the result was 0.37 from x to y and 0 from y to x. It clearly indicates that x can predicts y with 0.37 ppscore and y cannot predict x.
But what I actually want to know is the relation between 2 datasets, either it is directly proportional (positive) or inversely proportional(negative) with each other.
Thank you,