current12 / Stat-222-Project

3 stars 0 forks source link

Handle outliers in Altman-Z #46

Closed ijyliu closed 5 months ago

ijyliu commented 6 months ago

check sectors - banks etc.

maybe don't winsorize

ijyliu commented 5 months ago

@OwenLin2001 to investigate sectors

ijyliu commented 5 months ago

are the extreme observations in a particular sector? if so, reconsider winsorizing

OwenLin2001 commented 5 months ago

With the new dataset on All_Data_with_NLP_Features, the issue is much more mild. All the Altman Z scores are below 8 with 13 companies above a score of 6. Out of the 13 companies, we see big companies like Google and Chevron.

Sector-wise, IT, Health Care, and Energy seems to be the three sectors with high Altman-Z score.

I think no further action is needed regarding Altman-Z score outside of these observation.

ijyliu commented 5 months ago

This issue is in the financial data cleaning file, not all data. Once it's in all data it's already been winsorised

On Sun, Mar 31, 2024, 5:20 PM OwenLin2001 @.***> wrote:

With the new dataset on All_Data_with_NLP_Features, the issue is much more mild. All the Altman Z scores are below 8 with 13 companies above a score of 6. Out of the 13 companies, we see big companies like Google and Chevron.

Sector-wise, IT, Health Care, and Energy seems to be the three sectors with high Altman-Z score.

I think no further action is needed regarding Altman-Z score outside of these observation.

— Reply to this email directly, view it on GitHub https://github.com/current12/Stat-222-Project/issues/46#issuecomment-2028963949, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQCGE4OMBSVTR3K3E7VG6DLY3CR3DAVCNFSM6AAAAABEH7YFHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYHE3DGOJUHE . You are receiving this because you modified the open/close state.Message ID: @.***>

ijyliu commented 5 months ago

It's in this notebook

https://github.com/current12/Stat-222-Project/blob/main/Code%2FData%20Loading%20and%20Cleaning%2FTabular%20Financial%2FCombine%20and%20Clean%20Tabular%20Financial%20Statements%20Data.ipynb

On Sun, Mar 31, 2024, 5:22 PM Isaac Liu @.***> wrote:

This issue is in the financial data cleaning file, not all data. Once it's in all data it's already been winsorised

On Sun, Mar 31, 2024, 5:20 PM OwenLin2001 @.***> wrote:

With the new dataset on All_Data_with_NLP_Features, the issue is much more mild. All the Altman Z scores are below 8 with 13 companies above a score of 6. Out of the 13 companies, we see big companies like Google and Chevron.

Sector-wise, IT, Health Care, and Energy seems to be the three sectors with high Altman-Z score.

I think no further action is needed regarding Altman-Z score outside of these observation.

— Reply to this email directly, view it on GitHub https://github.com/current12/Stat-222-Project/issues/46#issuecomment-2028963949, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQCGE4OMBSVTR3K3E7VG6DLY3CR3DAVCNFSM6AAAAABEH7YFHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYHE3DGOJUHE . You are receiving this because you modified the open/close state.Message ID: @.***>

ijyliu commented 5 months ago

You will have to find the outliers before they are wisnorized and save them. Then join sector information on. You can also join on the fixed quarter date and companies in all data NLP to see which of the outliers are relevant

On Sun, Mar 31, 2024, 5:24 PM Isaac Liu @.***> wrote:

It's in this notebook

https://github.com/current12/Stat-222-Project/blob/main/Code%2FData%20Loading%20and%20Cleaning%2FTabular%20Financial%2FCombine%20and%20Clean%20Tabular%20Financial%20Statements%20Data.ipynb

On Sun, Mar 31, 2024, 5:22 PM Isaac Liu @.***> wrote:

This issue is in the financial data cleaning file, not all data. Once it's in all data it's already been winsorised

On Sun, Mar 31, 2024, 5:20 PM OwenLin2001 @.***> wrote:

With the new dataset on All_Data_with_NLP_Features, the issue is much more mild. All the Altman Z scores are below 8 with 13 companies above a score of 6. Out of the 13 companies, we see big companies like Google and Chevron.

Sector-wise, IT, Health Care, and Energy seems to be the three sectors with high Altman-Z score.

I think no further action is needed regarding Altman-Z score outside of these observation.

— Reply to this email directly, view it on GitHub https://github.com/current12/Stat-222-Project/issues/46#issuecomment-2028963949, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQCGE4OMBSVTR3K3E7VG6DLY3CR3DAVCNFSM6AAAAABEH7YFHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYHE3DGOJUHE . You are receiving this because you modified the open/close state.Message ID: @.***>

OwenLin2001 commented 5 months ago

Pre-winsorized data exhibits a similar trend. For Altman Z > 6, the top 4 sectors (after inner join pre-winsorized data with all_data_nlp on tickers) are

  1. IT - 15
  2. Consumer Discretionary - 7
  3. Health Care - 6
  4. Energy - 6

Among companies that are outliers in pre-winsorized data but are not outliers in the all_data_nlp, there isn't a trend. After winsorized, some of the companies remain a high score (eg. AAPL at 4.32) and some of them goes down (eg. MUR at 1.20)

What are some expected outcome in your envision after inspect Altman Z outliers?

ijyliu commented 5 months ago

It's a little predictable that some tech companies are scoring very high, they probably have near zero liabilities. The other sectors are kind of big sectors.

all_data_fixed_quarter_dates_sector_distribution

I think I'm good with winsorizing as is, even if maybe we should be doing it a little bit less for IT. The process will still maintain fairly high scores for the outlier companies.