TerraFusion / basicFusion

Terra Basic Fusion Project - University of Illinois
Other
2 stars 0 forks source link

Corrupt_Files #258

Closed shashankbansal6 closed 6 years ago

shashankbansal6 commented 7 years ago

These are the most recent corrupt files.

MISR MISR_Corrupt_Check_E_2000.txt MISR_Corrupt_Check_E_2001.txt MISR_Corrupt_Check_E_2002.txt MISR_Corrupt_Check_E_2003.txt MISR_Corrupt_Check_E_2004.txt MISR_Corrupt_Check_E_2005.txt MISR_Corrupt_Check_E_2006.txt MISR_Corrupt_Check_E_2007.txt

ASTER ASTER_Corrupt_Check_2000.txt ASTER_Corrupt_Check_2004.txt ASTER_Corrupt_Check_2005.txt ASTER_Corrupt_Check_2007.txt ASTER_Corrupt_Check_2008.txt ASTER_Corrupt_Check_2009.txt ASTER_Corrupt_Check_2012.txt ASTER_Corrupt_Check_2013.txt

MODIS (1KM) /projects/TDataFus/gyzhao/TF/data/MODIS/MOD021KM/2006/048/MOD021KM.A2006048.0800.006.2014221004306.hdf /projects/TDataFus/gyzhao/TF/data/MODIS/MOD021KM/2012/271/MOD021KM.A2012271.0235.006.2014224195407.hdf

LandonTClipp commented 7 years ago

Is this the final list?

LandonTClipp commented 7 years ago

Also remind me, what did you use to check the MOPITT HDF5 files? You used hdp dumpsds for hdf4 files, I don't remember what I told you to use for HDF5 files.

shashankbansal6 commented 7 years ago

There are no corrupt files in MOPITT. I used ncdump for MOPITT. There are only 2 corrupt files in MODIS (1KM) and in CERES, there are only 2 types of file in each year, one ending with .met and one with no ending extension in both FM1 and FM2. Which files do I check in CERES then?

LandonTClipp commented 7 years ago

Thanks Shashank. CERES hdf files do not have any file suffix on them. Do not check the metadata files. Do both FM1 and FM2 files.

LandonTClipp commented 7 years ago

@shashankbansal6

There is an error in the algorithm we have been using for this error check. For instance:

/projects/TDataFus/gyzhao/TF/data/ASTER/ASTER_L1T/ASTT/AST_L1T.003/2004.04.02

Notice how there are many files <1MB in size. Normal, non-corrupt files are >100MB in size. hdp dumpsds does not return non-zero value for some of these files... We need to determine a way to handle these. Perhaps you can add in a special case for CERES where the file is marked as bad if size is less than 100MB.

Also, I am considering having you make a slight modification to your script, perhaps add in a string search of the STDOUT of hdp dumpsds for "HDP ERROR." Also run for just ASTER and without the -h flag.

Can you please make this modification, run on all ASTER, and let me know how long it takes? It will no doubt be significantly longer but this is a crucial step so it may be worth the time.

LandonTClipp commented 7 years ago

@sandman0o0

Sean, this directory has many ASTER files that are 94KB. Can you confirm they are like this on the original data center as well? I noticed you have been re-downloading some of the files as well. Our list to you may be incomplete.

sandman0o0 commented 7 years ago

The 94K files looks like errors... I'm going to run across all of them again, then run another wide find to see if any others got missed.

I have been redownloading what you sent. I'm also finding that there are a lot of files missing on the daac side. I'm going to contact the daac and ask about this and work on a script to pull their indexes for each day and compare to the original lists they sent us.

sandman0o0 commented 7 years ago

Also, quick question, all but one of the MISR files listed are the MISBR files that I thought we weren't using? Are we using these?

LandonTClipp commented 7 years ago

We are not using these, this is a mistake.

@sandman0o0 Something bad is happening, look at the directory: /projects/TDataFus/gyzhao/TF/data

This is probably an issue arising from the bad directories given by @shashankbansal6 's ASTER list??

LandonTClipp commented 7 years ago

@shashankbansal6 Please filter the MISBR files from your script so they are not considered.

sandman0o0 commented 7 years ago

Alright, I'll ignore the MISBR files. I redownloaded this file:

/projects/TDataFus/gyzhao/TF/data/MISR/MI1B2E.003/2003.12.15/MISR_AM1_GRP_ELLIPSOID_GM_P175_O021232_BF_F03_0024.hdf

it matches the size on the ftp side, but it matched before as well. Can you check it with your scripts again?

shashankbansal6 commented 7 years ago

I will add the case in my code to filter then out.

I ran the hdp dumpsds -h commands on /projects/TDataFus/gyzhao/TF/data/MISR/MI1B2E.003/2003.12.15/MISR_AM1_GRP_ELLIPSOID_GM_P175_O021232_BF_F03_0024.hdf and it returns a segmentation fault. It could be possible that the file is corrupt on the server as well.

LandonTClipp commented 7 years ago

@shashankbansal6 Have you made the few changes we discussed to your script?

We need to run everything again. This may take a long time but this is a critical step. This step needs to happen before anything else on my end is done.

LandonTClipp commented 7 years ago

@shashankbansal6 Also be sure to fix the issues with your script's output files being written by multiple processes (thus corrupting some lines of the txt file).

sandman0o0 commented 7 years ago

The MODIS 1KM files have been re-downloaded. Please check those 2 again when you have a chance.

shashankbansal6 commented 7 years ago

I have checked both the files and they seem to be good now.

LandonTClipp commented 7 years ago

@shashankbansal6 See the following histograms:

mod mop ast cer mis_agp mis_gmp mis_grp mis_hrll

sandman0o0 commented 7 years ago

All 94K files have been redownloaded and/or replaced with their reprocessed counterparts. The attached file lists all the files that are reprocessed versions. 20171010.ASTER_reprocessed.downloaded.txt There are 10 files missing on the DAAC side with no reprocessed version. I will send this list to the DAAC to figure out what's up.

LandonTClipp commented 7 years ago

New script ran on ROGER (e3554cd). The script returned no corrupt files.

sandman0o0 commented 7 years ago

Sorry for the delay on this... this upload failed the first time. Here's the full listing of reprocessed files sans the 94k files that were originally pulled down. ASTER.reprocessed_full.sans95k.txt