current12 / Stat-222-Project

3 stars 0 forks source link

Earnings Calls - Bad Sectors #17

Closed ijyliu closed 6 months ago

ijyliu commented 6 months ago

Retail, Software, Semiconductors

ijyliu commented 6 months ago

@OwenLin2001

the raw data came from here: https://www.kaggle.com/datasets/v1ctor10/earnings-call-nlp-strategy-v2

on the right hand side of the page, there is a 'data explorer' where you can browse individual files. I am seeing items with the rate limit error for some (but not all) files in software and semiconductors. does this agree with what you're finding - do you have some usable files for these sectors?

the original paper only analyzed 5000 calls in three sectors - banks, automobiles, consumer durables, so I would say it's not at all expected that the data outside of those sectors is reliable. as said earlier, I don't really think it's worth scraping more and we may just have to work with what we have.

if you have extra time after loading, you can skim readme.md and documentation.pdf for more understanding of the features

OwenLin2001 commented 6 months ago

I will skim through the raw data and see if I can pull more information from it.

ijyliu commented 6 months ago

what I meant was, I already looked at those sectors (software, semiconductors, retail) and the raw data has the rate limit error for some but not all earnings calls. did you get some valid earnings calls from those sectors? if so, I don't think there's anything we can do outside of more scraping (which could be something we do later if we have extra time or something)

current12 commented 6 months ago

has this issue been fixed? If not, I can retrieve the earning call scripts from the financial web Api, I just need the companies' names

OwenLin2001 commented 6 months ago

Let me look into the raw data first, I will send you a dm after @current12