Closed shashankbansal6 closed 6 years ago
Is this the final list?
Also remind me, what did you use to check the MOPITT HDF5 files? You used hdp dumpsds for hdf4 files, I don't remember what I told you to use for HDF5 files.
There are no corrupt files in MOPITT. I used ncdump for MOPITT. There are only 2 corrupt files in MODIS (1KM) and in CERES, there are only 2 types of file in each year, one ending with .met and one with no ending extension in both FM1 and FM2. Which files do I check in CERES then?
Thanks Shashank. CERES hdf files do not have any file suffix on them. Do not check the metadata files. Do both FM1 and FM2 files.
@shashankbansal6
There is an error in the algorithm we have been using for this error check. For instance:
/projects/TDataFus/gyzhao/TF/data/ASTER/ASTER_L1T/ASTT/AST_L1T.003/2004.04.02
Notice how there are many files <1MB in size. Normal, non-corrupt files are >100MB in size. hdp dumpsds does not return non-zero value for some of these files... We need to determine a way to handle these. Perhaps you can add in a special case for CERES where the file is marked as bad if size is less than 100MB.
Also, I am considering having you make a slight modification to your script, perhaps add in a string search of the STDOUT of hdp dumpsds for "HDP ERROR." Also run for just ASTER and without the -h flag.
Can you please make this modification, run on all ASTER, and let me know how long it takes? It will no doubt be significantly longer but this is a crucial step so it may be worth the time.
@sandman0o0
Sean, this directory has many ASTER files that are 94KB. Can you confirm they are like this on the original data center as well? I noticed you have been re-downloading some of the files as well. Our list to you may be incomplete.
The 94K files looks like errors... I'm going to run across all of them again, then run another wide find to see if any others got missed.
I have been redownloading what you sent. I'm also finding that there are a lot of files missing on the daac side. I'm going to contact the daac and ask about this and work on a script to pull their indexes for each day and compare to the original lists they sent us.
Also, quick question, all but one of the MISR files listed are the MISBR files that I thought we weren't using? Are we using these?
We are not using these, this is a mistake.
@sandman0o0 Something bad is happening, look at the directory: /projects/TDataFus/gyzhao/TF/data
This is probably an issue arising from the bad directories given by @shashankbansal6 's ASTER list??
@shashankbansal6 Please filter the MISBR files from your script so they are not considered.
Alright, I'll ignore the MISBR files. I redownloaded this file:
/projects/TDataFus/gyzhao/TF/data/MISR/MI1B2E.003/2003.12.15/MISR_AM1_GRP_ELLIPSOID_GM_P175_O021232_BF_F03_0024.hdf
it matches the size on the ftp side, but it matched before as well. Can you check it with your scripts again?
I will add the case in my code to filter then out.
I ran the hdp dumpsds -h commands on /projects/TDataFus/gyzhao/TF/data/MISR/MI1B2E.003/2003.12.15/MISR_AM1_GRP_ELLIPSOID_GM_P175_O021232_BF_F03_0024.hdf and it returns a segmentation fault. It could be possible that the file is corrupt on the server as well.
@shashankbansal6 Have you made the few changes we discussed to your script?
We need to run everything again. This may take a long time but this is a critical step. This step needs to happen before anything else on my end is done.
@shashankbansal6 Also be sure to fix the issues with your script's output files being written by multiple processes (thus corrupting some lines of the txt file).
The MODIS 1KM files have been re-downloaded. Please check those 2 again when you have a chance.
I have checked both the files and they seem to be good now.
All 94K files have been redownloaded and/or replaced with their reprocessed counterparts. The attached file lists all the files that are reprocessed versions. 20171010.ASTER_reprocessed.downloaded.txt There are 10 files missing on the DAAC side with no reprocessed version. I will send this list to the DAAC to figure out what's up.
New script ran on ROGER (e3554cd). The script returned no corrupt files.
Sorry for the delay on this... this upload failed the first time. Here's the full listing of reprocessed files sans the 94k files that were originally pulled down. ASTER.reprocessed_full.sans95k.txt
These are the most recent corrupt files.
MISR MISR_Corrupt_Check_E_2000.txt MISR_Corrupt_Check_E_2001.txt MISR_Corrupt_Check_E_2002.txt MISR_Corrupt_Check_E_2003.txt MISR_Corrupt_Check_E_2004.txt MISR_Corrupt_Check_E_2005.txt MISR_Corrupt_Check_E_2006.txt MISR_Corrupt_Check_E_2007.txt
ASTER ASTER_Corrupt_Check_2000.txt ASTER_Corrupt_Check_2004.txt ASTER_Corrupt_Check_2005.txt ASTER_Corrupt_Check_2007.txt ASTER_Corrupt_Check_2008.txt ASTER_Corrupt_Check_2009.txt ASTER_Corrupt_Check_2012.txt ASTER_Corrupt_Check_2013.txt
MODIS (1KM) /projects/TDataFus/gyzhao/TF/data/MODIS/MOD021KM/2006/048/MOD021KM.A2006048.0800.006.2014221004306.hdf /projects/TDataFus/gyzhao/TF/data/MODIS/MOD021KM/2012/271/MOD021KM.A2012271.0235.006.2014224195407.hdf