NASA-IMPACT / csdap-cumulus

SmallSat Cumulus Deployment
Other
1 stars 1 forks source link

Examine Distribution Logs to find out why WV04 is not showing up in the downstream metrics system #336

Closed krisstanton closed 7 months ago

krisstanton commented 8 months ago

It seems that WV04 dataset is not showing up in the downstream metrics system.

On a recent test, other datasets are showing up but WV04 is not.

Try and find out why.

Note: the Earthdata search and download of the granules is working functioning. Also Note: The WV04 datasets (both Pan and MSI) are very small (only about 6k granules each).

Couple of places to start (on 'checking metrics')

krisstanton commented 7 months ago

Update (DRAFT): Here is the most recent approach to this.

The logs for WV04 are not showing up on Kibana, we need to find out why.

NOTE: I might be able to run a test, by downloading a WV04 granule - And see if it triggers errors (with earthdata, or our own cumulus instance, or our own aws account, or metrics aws logs, or kibana)

krisstanton commented 7 months ago

Here is some download data to investigate

WV04 downloads - some files and dates to examine

On 12/21/23
WV04_20181220060257_4c3e0592-03b3-487c-8ab0-bcbc5330776e-inv_18DEC20060257-M1BS-059420283020_01_P003-BROWSE.jpg
WV04_20181220060257_4c3e0592-03b3-487c-8ab0-bcbc5330776e-inv_18DEC20060257-M1BS-059420283020_01_P003-BROWSE.webarchive
On 1/14/2024”
WV04_20180919192625_b368fcce-f29a-4e9e-a19e-391da8815c0a-inv_18SEP19192625-M1BS-059412339040_01_P006.tar
WV04_20180919192625_b368fcce-f29a-4e9e-a19e-391da8815c0a-inv_18SEP19192625-M1BS-059412339040_01_P006-BROWSE.jbg
On 1/16/2024
WV04_20181221061652_425e4455-75dd-43ff-8cfd-0221291f0262-inv_18DEC21061652-P1BS-059420282200_01_P001
WV04_20181221061652_425e4455-75dd-43ff-8cfd-0221291f0262-inv_18DEC21061652-P1BS-059420282200_01_P001
WV04_20181221094030_4b100a4f-2064-4068-a348-6a205cdc33c7-inv_18DEC21094030-P1BS-059420299020_01_P004
WV04_20181221094030_4b100a4f-2064-4068-a348-6a205cdc33c7-inv_18DEC21094030-P1BS-059420299020_01_P004
WV04_20181222045551_73ba3830-9f58-4b42-9097-a9a1ff451090-inv_18DEC22045551-M1BS-059420282180_01_P001
WV04_20181222045557_73ba3830-9f58-4b42-9097-a9a1ff451090-inv_18DEC22045557-M1BS-059420282180_01_P005
WV04_20181222045557_73ba3830-9f58-4b42-9097-a9a1ff451090-inv_18DEC22045557-P1BS-059420282180_01_P005
krisstanton commented 7 months ago

Update: I ran a new test today where I downloaded WV03 (Pan and MSI) and WV04 (Pan and MSI) granules. I did find logs of that download in the Cumulus PROD S3 buckets in 2 places. -Cloudfront Distribution -S3 Access Logs The Date and Time range for these files would be Today between 12:00 pm central and 1:00 pm central. (2024_02_27 between 12:00 pm and 1:00 pm central which is around 18:00 to 19:00 UTC Kibana is having trouble loading right now but will try again shortly.

Have not yet reached out to the metrics team to see if the logs were replicated or not. Want to confirm that Kibana has the logs (or does not have them) first.

krisstanton commented 7 months ago

Granules Downloaded for most recent test were:

WV03 Pan:   WV03_20220730235024_104001007A56C200_22JUL30235024-P1BS-506690462010_01_P007
WV03 MSI:   WV03_20220701175647_104001007812D700_22JUL01175647-M1BS-506564177020_01_P006
WV04 MSI:   WV04_20181222045557_73ba3830-9f58-4b42-9097-a9a1ff451090-inv_18DEC22045557-M1BS-059420282180_01_P005
WV04 Pan    WV04_20181222045557_73ba3830-9f58-4b42-9097-a9a1ff451090-inv_18DEC22045557-P1BS-059420282180_01_P005

Note: the files downloaded for WV03 were the Tar files and were only about 1 to 2 MegaBytes each. The WV04 files were the Tifs and were around 1 GigaByte and 3.7 GigaBytes

krisstanton commented 7 months ago

Also, here are a couple of helper scripts I wrote to combine the log files (since the S3 bucket has a large number of small files for any given small-ish time frame.

Script to Combine Text Files (for S3 Record Search) Log File Source URL: https://s3.console.aws.amazon.com/s3/buckets/csda-cumulus-prod-internal-5047?region=us-west-2&bucketType=general&prefix=cumulus-prod/ems-distribution/s3-server-access-logs/&showversions=false

# only_combine_text_files.py
#
# python only_combine_text_files.py
#
import os
#
# Settings
input_sub_dir           = "s3_logs__only_2024_02_27"            
output_full_file_path   = "s3_logs__only_2024_02_27_output/output.txt"      # s3_logs__only_2024_02_27_output
#
# Function that does the work!
# S3 logs already are just text files (not gzipped) so all we need to do here is combine them to make them easier to read and search
def combine_text_contents(input_dir, output_file):
    """
    Combines the contents of all text files in the specified directory into a single text file
    Assumptions: 
      (1) the contents of each file are text files (able to be opened as text)
      (2) there are not a large number of files (maybe in the hundreds at most)
      (3) the total size of all the files is not massive (maybe in the single to double digit megabyte range)
    :param input_dir: Directory to search for .gz files
    :param output_file: File path for the combined output text file.
    """
    # Ensure the input directory exists
    if not os.path.isdir(input_dir):
        raise ValueError(f"Input directory '{input_dir}' does not exist.")
    #
    # Create/open the output file in write mode 
    with open(output_file, 'w') as outfile:
        # Iterate through all items in the input directory
        for item in os.listdir(input_dir):
            # Construct full path to the item
            item_path = os.path.join(input_dir, item)
            # Check if the item is a file and has a .gz extension 
            #if os.path.isfile(item_path) and item.endswith('.gz'):
            #   # Open the gzipped file
            with open(item_path, 'r') as file:
                print("  Just Opened File: " + str(item_path))
                # Iterate through linesin the extracted file and write them to the output file 
                for line in file:
                    outfile.write(line)
                    #print("    Just wrote line: " + str(line))
#
# Linear Execution
#
print("")
print("combine_text_contents.py:    STARTED")
print("")
#
combine_text_contents(input_dir=input_sub_dir, output_file=output_full_file_path)
#
print("")
print("combine_text_contents.py:    ENDED")
print("")
#
#
# Expected output Example
#
# ➜  distribution_logs__Feb2024_WV04_task python only_combine_text_files.py 
#
# combine_text_contents.py:     STARTED
#
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-33-40-347AB165DAF820E7.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-25-45-EFC27FBA034E51BA.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-19-57-B06FF7D6409B3368.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-45-56-0D64D93A7F822EE0.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-38-33-1D4998BE84A1D0C1.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-19-15-12-1FD9BE175FB83CDB.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-48-48-3CD952D57DD621AB.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-31-01-0C24B6D4240EB011.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-19-16-07-CC3888698125AABE.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-37-47-1FA11F236AC4EF45.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-19-15-19-253FFE8905AB0BA1.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-21-45-39E69B008D1335F9.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-39-10-62682FCCFA4F8450.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-18-55-6475254D9A56655D.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-47-23-82A5102552D50A7F.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-20-34-47E5D9F7E85F020E.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-16-37-A8E3A3D4DB971A41.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-48-50-D30D01DE35AA5594.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-35-59-CDF005DD73883564.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-19-15-19-454FC456A9F3C486.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-25-50-D1237EC545F4AD24.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-27-40-2F3E73B331AC280F.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-50-24-AEAF4B5384BE2D81.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-50-26-0D5CBC8F1256882F.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-29-46-D5710C3B9AFA8BCE.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-50-03-A28092BDDA3FD4A6.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-22-55-7927250E79B6CDA2.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-21-14-91A315E24F6EB385.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-19-15-58-420F811F626C7919.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-19-14-48-28C4EB219DD42DCB.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-37-55-86CDFBDCD9608160.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-17-08-65E5555CE16A82CC.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-19-15-46-88EB056AB5B75530.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-19-15-19-972A1365B37D7367.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-45-04-DFC594CCD6962C0D.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-19-14-40-5E0FBA60342E9574.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-20-52-1E8CC34726F5CA83.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-53-53-876470040861FD2B.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-35-40-3CEE9AD5ABE9C817.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-41-09-59C2234B254265BD.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-36-17-656E5BC3199FD776.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-19-14-59-AAA1472061321CB5.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-50-00-DA1A7A7FB8A5064D.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-43-55-E0C315D1EA800D90.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-16-44-F0F837AD0885852F.txt
#   Just Opened File: s3_logs__only_2024_02_27/2024-02-27-18-16-17-402F7F6FC1934CBF.txt
#
# combine_text_contents.py:     ENDED
#
# END OF FILE

Script to Extract gz files AND combine their text content into a single file (for Cloudfront Distirbution Record Search) Log File Source URL: https://s3.console.aws.amazon.com/s3/buckets/cloudfront-logs-2971f5e2c46fdf3dcd8a00b39e03196c?region=us-west-2&bucketType=general&prefix=cloudfront/AWSLogs/410469285047/us-west-2/csda_cumulus_api/&showversions=false

#extract_and_combine_util.py
#
# python extract_and_combine_util.py
#
import os
import gzip
#
# Settings
input_sub_dir           = "only_2024_02_27"                     #"only_2023_12_21"
output_full_file_path   = "only_2024_02_27_output/output.txt"   #"only_2023_12_21_single_file/output.txt"
#
# Function that does the work!
def combine_gz_contents(input_dir, output_file):
    """
    Combines the contents of all .gz files in the specified directory into a single text file
    Assumptions: 
      (1) the contents within all gz files are text files
      (2) there are not a large number of files (maybe in the hundreds at most)
      (3) the total size of all the files is not massive (maybe in the single to double digit megabyte range)
    :param input_dir: Directory to search for .gz files
    :param output_file: File path for the combined output text file.
    """
    # Ensure the input directory exists
    if not os.path.isdir(input_dir):
        raise ValueError(f"Input directory '{input_dir}' does not exist.")
    #
    # Create/open the output file in write mode 
    with open(output_file, 'w') as outfile:
        # Iterate through all items in the input directory
        for item in os.listdir(input_dir):
            # Construct full path to the item
            item_path = os.path.join(input_dir, item)
            # Check if the item is a file and has a .gz extension 
            if os.path.isfile(item_path) and item.endswith('.gz'):
                # Open the gzipped file
                with gzip.open(item_path, 'rt') as gzfile:
                    print("  Just Opened File: " + str(item_path))
                    # Iterate through linesin the extracted file and write them to the output file 
                    for line in gzfile:
                        outfile.write(line)
                        #print("    Just wrote line: " + str(line))

#
# Linear Execution
#
print("")
print("extract_and_combine_util.py:     STARTED")
print("")
#
combine_gz_contents(input_dir=input_sub_dir, output_file=output_full_file_path)
#
print("")
print("extract_and_combine_util.py:     ENDED")
print("")
#
#
# Expected output Example
# ➜  distribution_logs__Feb2024_WV04_task python extract_and_combine_util.py
#
# extract_and_combine_util.py:      STARTED
#
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-22-04.788d401a.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-22-00.4e61311b.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-18.2e03c1ea.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-22-00.f04b3d3d.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-22-03.8412c70f.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-20.4e78751e.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-21.3291ad24.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-18.e4303182.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-22-04.93776bad.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-16.0bac7799.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-22-03.d7503ff2.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-18.3c6bf126.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-22-04.e14df8f0.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-22-00.950e0ec7.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-09.eb516a26.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-18.a4b94728.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-22-03.86d4008a.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-20.7ef053e8.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-16.5a07673d.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-21.76398b53.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-18.71510d2b.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-22-02.fb455712.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-23.7c32b083.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-22-00.9b49da20.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-16.734e596b.gz
#   Just Opened File: only_2023_12_21/E3ISFOBC9P0M4H.2023-12-21-23.7b6b93e1.gz
#
# extract_and_combine_util.py:      ENDED
#
#
# END OF FILE
krisstanton commented 7 months ago

Final Update: I was able to get into Kibana after trying a few times. There was a lot of latency but I was able to get the searches to show up for the tests that were ran earlier today. Both WV03 and WV04 granules from the tests showed up in the Kibana Search.

Here are the direct search URLs for Kibana // WV03 https://metrics.earthdata.nasa.gov/s/metrics-csda/app/discover#/view/069e1fc0-d5bf-11ee-b9c2-6fd0ef87b0c3?_g=(filters:!(),refreshInterval:(pause:!t,value:60000),time:(from:'2024-02-27T18:30:00.000Z',to:'2024-02-27T19:00:00.000Z'))&_a=(columns:!(),filters:!(),grid:(),hideChart:!f,index:'73787066-46ad-49c3-8126-b643c12ab870',interval:auto,query:(language:kuery,query:'*WV03*'),sort:!(!('@timestamp',desc)))

// WV04 https://metrics.earthdata.nasa.gov/s/metrics-csda/app/discover#/view/069e1fc0-d5bf-11ee-b9c2-6fd0ef87b0c3?_g=(filters:!(),refreshInterval:(pause:!t,value:60000),time:(from:'2024-02-27T18:30:00.000Z',to:'2024-02-27T19:00:00.000Z'))&_a=(columns:!(),filters:!(),grid:(),hideChart:!f,index:'73787066-46ad-49c3-8126-b643c12ab870',interval:auto,query:(language:kuery,query:'*WV04*'),sort:!(!('@timestamp',desc)))

krisstanton commented 7 months ago

Here are the two screenshots from Kibana search results with captions below them (Top image is WV03, Bottom image is WV04)

WV03_MetricsSearch Kibana Search Screenshot for WV03

WV04_MetricsSearch Kibana Search Screenshot for WV04