Closed mhidas closed 6 years ago
There are 935 files in /mnt/ebs/incoming/AOCRN/any, uploaded since yesterday afternoon which have also not triggered the incoming handler.
I suspect none of the pipelines are being triggered. The last entry in /mnt/ebs/log/data-services/process.log.1
is Apr 28 16:12
Look like the incoming directories were cleared out (i.e. files handled) yesterday, and pipelines running ok now.
@anguss00 Any ideas what went wrong? Could it be related to https://github.com/aodn/internal-discussions/issues/223 ?
I cleared up space at 7.33am on the main partition / not /mnt/ebs/
root@10-aws-syd:~$ less /var/log/apt/history.log
Start-Date: 2016-05-02 07:33:19
Commandline: apt-get autoremove
It looks like the processing was immediately kicked off,
root@10-aws-syd:~$ head /mnt/ebs/log/data-services/process.log
May 2 07:33:19 10-aws-syd AATAMS_SATTAG_DM: Handling lftp file '/mnt/ebs/tmp/tmp.ric1YeQk97/aatams_sattag_dm_lftp.20160502-000044.log'
May 2 07:33:19 10-aws-syd AATAMS_SATTAG_DM: Handling '0' deletions
May 2 07:33:19 10-aws-syd AATAMS_SATTAG_DM: Handling '0' additions
May 2 07:33:19 10-aws-syd SOOP_XBT_NRT: Handling rsync file '/mnt/ebs/tmp/tmp.Q8qy1e2dme/IMOS_SOOP-XBT_NRT_fileList.csv'
May 2 07:33:19 10-aws-syd AATAMS_SATTAG_DM: Successfully handled all AATAMS_SATTAG_DM files!
May 2 07:33:19 10-aws-syd SOOP_XBT_NRT: Handling '10' additions
May 2 07:33:19 10-aws-syd SOOP_XBT_NRT: Bulk indexing/unindexing files from '/mnt/ebs/tmp/tmp.VL9NCyi142'
May 2 07:33:19 10-aws-syd AATAMS_SATTAG_NRT: Handling rsync file '/mnt/ebs/tmp/tmp.kI8NnSoLDN/manifest'
May 2 07:33:19 10-aws-syd AATAMS_SATTAG_NRT: Handling '0' additions
May 2 07:33:19 10-aws-syd AATAMS_SATTAG_NRT: Successfully handled all aatams nrt files!
It looks like there's a lot of files in /tmp that should really be in /mnt/ebs/tmp
root@10-aws-syd:~$ ls -lht $( find /tmp/ -type f ) | grep \.nc
-rw-r--r-- 1 root root 73K May 2 11:00 /tmp/sync_archive.log
-rw-rw-r-- 1 projectofficer projectofficer 3.7M May 1 20:14 /tmp/tmpay1Fdo.pid6817.ncwa.tmp
-rw-rw-r-- 1 projectofficer projectofficer 6.3M May 1 20:14 /tmp/tmpRDEzox/IMOS_SOOP-TRV_B_20160323T140000Z_VNCF_FV01_END-20160407T210727Z_C-20160501T101326Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 7.3M May 1 20:13 /tmp/tmpbnQkHe/IMOS_SOOP-TRV_B_20160302T140000Z_VNCF_FV01_END-20160320T230800Z_C-20160501T101210Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 3.4M May 1 20:11 /tmp/tmpM9bbwv/IMOS_SOOP-TRV_B_20160126T140000Z_VNCF_FV01_END-20160203T031335Z_C-20160501T101119Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 7.8M May 1 20:11 /tmp/tmpYkV8P8/IMOS_SOOP-TRV_B_20151129T140000Z_VNCF_FV01_END-20151217T020450Z_C-20160501T101005Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 3.9M May 1 20:10 /tmp/tmpvl3oNp/IMOS_SOOP-TRV_B_20151118T140000Z_VNCF_FV01_END-20151127T034052Z_C-20160501T100928Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 4.4M May 1 20:09 /tmp/tmpcZbVVZ/IMOS_SOOP-TRV_B_20151104T140000Z_VNCF_FV01_END-20151116T003500Z_C-20160501T100847Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 2.3M May 1 20:08 /tmp/tmpXijlc1/IMOS_SOOP-TRV_B_20151023T140000Z_VNCF_FV01_END-20151029T053942Z_C-20160501T100825Z.nc
@mhidas list of current issues,
clean up files in /mnt/ebs/tmp
-rw-rw-r-- 1 projectofficer projectofficer 5.9M Sep 1 2015 /mnt/ebs/tmp/ANMN_NSW_TBiq8/PH100/CUR/IMOS_ANMN-NSW_AETVZ_20131031T190000Z_PH100_FV01_PH100-1309-Workhorse-ADCP-109.5_END-20131130T024959Z_C-20150804T053718Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 16M Sep 1 2015 /mnt/ebs/tmp/ANMN_NSW_TBiq8/PH100/BGC/IMOS_ANMN-NSW_BCDKOSTUZ_20130827T002954Z_PH100_FV01_PH100-1309-WQM-15_END-20131130T073040Z_C-20150804T051823Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 16M Sep 1 2015 /mnt/ebs/tmp/ANMN_NSW_TBiq8/PH100/BGC/IMOS_ANMN-NSW_BCDKOSTUZ_20130605T061454Z_PH100_FV01_PH100-1306-WQM-15_END-20130829T044537Z_C-20150827T065647Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 17M Sep 1 2015 /mnt/ebs/tmp/ANMN_NSW_TBiq8/PH100/BGC/IMOS_ANMN-NSW_BCDKOSTUZ_20130305T025904Z_PH100_FV01_PH100-1303-WQM-15_END-20130607T041500Z_C-20150804T045342Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 16M Sep 1 2015 /mnt/ebs/tmp/ANMN_NSW_TBiq8/PH100/BGC/IMOS_ANMN-NSW_BCDKOSTUZ_20120903T035944Z_PH100_FV01_PH100-1209-WQM-15_END-20121129T103036Z_C-20150804T044440Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 19M Sep 1 2015 /mnt/ebs/tmp/ANMN_NSW_TBiq8/PH100/BGC/IMOS_ANMN-NSW_BCDKOSTUZ_20121128T042933Z_PH100_FV01_PH100-1212-WQM-15_END-20130306T233035Z_C-20150804T044917Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 16M Sep 1 2015 /mnt/ebs/tmp/ANMN_NSW_TBiq8/PH100/BGC/IMOS_ANMN-NSW_BCDKOSTUZ_20111209T001036Z_PH100_FV01_PH100-1112-WQM-15_END-20120302T061136Z_C-20150804T042557Z.nc
root@10-aws-syd:~$ uptime
11:32:30 up 14 days, 2:59, 5 users, load average: 1.74, 2.46, 2.16
fix any pipeline processes that incorrectly use /tmp. Looks like lots of soop trv
root@10-aws-syd:~$ ls -lht $( find /tmp/ -type f ) | grep \.nc | less
-rw-r--r-- 1 root root 73K May 2 11:00 /tmp/sync_archive.log
-rw-rw-r-- 1 projectofficer projectofficer 3.7M May 1 20:14 /tmp/tmpay1Fdo.pid6817.ncwa.tmp
-rw-rw-r-- 1 projectofficer projectofficer 6.3M May 1 20:14 /tmp/tmpRDEzox/IMOS_SOOP-TRV_B_20160323T140000Z_VNCF_FV01_END-20160407T210727Z_C-20160501T101326Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 7.3M May 1 20:13 /tmp/tmpbnQkHe/IMOS_SOOP-TRV_B_20160302T140000Z_VNCF_FV01_END-20160320T230800Z_C-20160501T101210Z.nc
-rw-rw-r-- 1 projectofficer projectofficer 3.4M May 1 20:11 /tmp/tmpM9bbwv/IMOS_SOOP-
To note, on debian based systems like Ubuntu, it appears that the tmp dir is not cleaned up by cron but by reboot. see, http://serverfault.com/questions/377348/when-does-tmp-get-cleared
Note that this was a serious issue. Files were not processed over the weekend (although none were lost).
No need to fix anfog_dm. I have to deal with the files in error_dir . However issue with file sitting in anfog_rt incoming dir
Cleaned old dirs out of /tmp to free up more space
mv $( find /tmp/ -ctime +2 -type d ) /mnt/ebs/tmp/from_root_tmp/
@julian1 Please make /mnt/ebs/tmp/from_root_tmp/
read/writeable to project officers so that we can remove files once we've figured out why they were in /tmp
(and made sure they don't end up there again). At the moment even just running du
to see how much data is in there results in a bunch of "Permission denied" messages.
@mhidas I've changed the owner to projectofficer. Let me know if that's not enough.
:+1: Thanks @julian1
@julian1 , @mhidas - are you sure that's a good idea? Would read only access be sufficient? Pipeline processes shouldn't be writing to tmp right?
@pblain, /mnt/ebs/tmp/from_root_tmp/ is a directory that contains files that were purged out of /tmp that should not have been there, and which were stale and lost to the system.
I gave project officers ownership of this subdir only - to enable them to manually evaluate and process, and then to remove that directory.
No pipeline jobs should be using the /tmp directory. Unfortunately all talend jobs appear to.
@julian1 - makes sense. Thanks!
Summary for DOD
@mhidas Is this bug still relevant? Its on our board to do this iteration :)
euh , this is over 2 years old. I don't quite understand why it is on the board. This is the reason why we moved to pipeline 2
There are 112 files in
/mnt/ebs/incoming/ANMN/QLD
, uploaded between 16:30 and 18:30 last night, which have not triggered the incoming handler.