MS20190155 / Measuring-Corporate-Culture-Using-Machine-Learning

Code Repository for MS20190155
135 stars 97 forks source link

Result question #8

Closed JJery-web closed 1 year ago

JJery-web commented 1 year ago

Hello professor.

Thank you very much for your amazing project!

After I run the score.py file, I got the result for five culture values. And I see in your paper that the "Weighted frequency count of innovation-related words in the QA section of learning calls average over a three-year window.". In order to get the variable results in your paper, should I need to add up a company's four file values? For example, in 2013 company X had a,b,c,d 4 files for earning calls transcript. So in order to get the innovation value in your paper, after running score.py file, I need to add up (innovation_a/length of a)+(innovation_b/length of b)+(innovation_c/length of c)+(innovation_d/length of d) to get the final value for innovation culture value for the "X company" in 2013?

I am very puzzled. Hope to get your reply!

maifeng commented 1 year ago

If a firm has 4 transcripts in a year, we took the average (rather than sum) of the values. This is because some firms may have fewer than 4 transcripts per year. Hope that helps.

From: 404 @.> Date: Thursday, February 9, 2023 at 7:05 AM To: MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning @.> Cc: Subscribed @.***> Subject: [MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning] Result question (Issue #8)

Hello professor.

Thank you very much for your amazing project!

After I run the score.py file, I got the result for five culture values. And I see in your paper that the "Weighted frequency count of innovation-related words in the QA section of learning calls average over a three-year window.". In order to get the variable results in your paper, should I need to add up a company's four file values? For example, in 2013 company X had a,b,c,d 4 files for earning calls transcript. So in order to get the innovation value in your paper, after running score.py file, I need to add up (innovation_a/length of a)+(innovation_b/length of b)+(innovation_c/length of c)+(innovation_d/length of d) to get the final value for innovation culture value for the "X company" in 2013?

I am very puzzled. Hope to get your reply!

— Reply to this email directly, view it on GitHubhttps://github.com/MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning/issues/8, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAJWNQ3B4VSVIYWZ4H5B6CLWWTMSLANCNFSM6AAAAAAUWOZVDQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

JJery-web commented 1 year ago

sum

Thank you very much for your reply.

The reason why I thought it was summed up is because I found that the variable result value inside table 3 of the paper is higher. For example, the mean value of innovation is 1.737, but I got the value of tfidf's innovation by running github's "score.py" code and then dividing it by the length of the document (i.e., innovation/document_length). 1

Based on the result of the division, I calculated its mean value (0.03) and maximum value (0.23). That's why it's confusing why there is such a big difference with the statistics of the paper. Maybe part of the reason is that the text data on github is not all the data, but I think using the model to extend 500 dimensions of words, it's not reasonable to have such a big error. 2

Very confused. Maybe I understand something wrong. I would appreciate it if I could get your reply.

MS20190155 commented 1 year ago

The values are % (*100). See (aggregate_firms.py). Sorry for not being clearer in the paper.

From: 404 @.> Date: Tuesday, February 14, 2023 at 11:12 PM To: MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning @.> Cc: Subscribed @.***> Subject: Re: [MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning] Result question (Issue #8)

sum

Thank you very much for your reply.

The reason why I thought it was summed up is because I found that the variable result value inside table 3 of the paper is higher. For example, the mean value of innovation is 1.737, but I got the value of tfidf's innovation by running github's "score.py" code and then dividing it by the length of the document (i.e., innovation/document_length). [1]https://user-images.githubusercontent.com/57023330/218927857-0df7e8ad-b551-444a-b2b1-7bdb10404977.png

Based on the result of the division, I calculated its mean value (0.03) and maximum value (0.23). That's why it's confusing why there is such a big difference with the statistics of the paper. Maybe part of the reason is that the text data on github is not all the data, but I think using the model to extend 500 dimensions of words, it's not reasonable to have such a big error. [2]https://user-images.githubusercontent.com/57023330/218927883-6c43246c-fa59-4212-9fd4-33a79cecb540.png

Very confused. I would also like to get your reply.

— Reply to this email directly, view it on GitHubhttps://github.com/MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning/issues/8#issuecomment-1430733300, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOEMFQBMKMDWE5TC6LGOC2TWXRJUVANCNFSM6AAAAAAUWOZVDQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>