Open didierearith opened 9 years ago
Thanks Didier.
We hit this bug last week ourselves in the development code – the overlap cleaner identified the second tile as redundant, which for other ingesters implies tile removal, and this was incorrectly running during WOfS ingestion. The WOfS ingester should be runnable with read-only access to its inputs (which is how we're running it), so any file modification is a serious bug.
Try updating to the latest version of the develop branch and retesting.
Hi AGDC Team,
While I'm testing WOfS ingestion, I found an issue.
I have downloaded some WOfS file from http://dapds00.nci.org.au/thredds/catalog/fk4/wofs/current/extents in a directory on my machine. Then I run the ingest command for the first time: e.g agdc/ingest/wofs.py --source /home/adminprod/data1/rs0/tiles/wofs/
Ingestion of the data files is processed successfully.
Then I want to test an ingestion of existing data in the Data Cube (source files have been updated and I want to update my Data Cube).
To do this, I change the date of the source files (with the Linux 'touch' command).
The datetime of the data is now greater than the datetime of the dataset in the database.
I run again agdc/ingest/wofs.py --source /home/adminprod/data1/rs0/tiles/wofs/ and I get the following exception:
2015-08-04 11:56:02,123 agdc.ingest.tile_contents INFO Tile already in place: '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER115-035_2011-01-10T01-59-19.155557.tif' 2015-08-04 11:56:02,217 agdc.ingest._core INFO Ingestion complete for dataset '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER115-035_2011-01-10T01-59-19.155557.tif' in 0:00:00.197192. Traceback (most recent call last): File "/home/adminprod/agdc-develop/agdc/ingest/wofs.py", line 97, in
agdc.ingest.run_ingest(WofsIngester)
File "/home/adminprod/agdc-develop/agdc/ingest/_core.py", line 586, in run_ingest
ingester.ingest(ingester.args.source_dir)
File "/home/adminprod/agdc-develop/agdc/ingest/_core.py", line 186, in ingest
self.ingest_individual_dataset(dataset_path)
File "/home/adminprod/agdc-develop/agdc/ingest/_core.py", line 207, in ingest_individual_dataset
self.tile(dataset_record, dataset)
File "/home/adminprod/agdc-develop/agdc/ingest/pretiled.py", line 312, in tile
dataset_record.store_tiles([tile_contents])
File "/home/adminprod/agdc-develop/agdc/ingest/dataset_record.py", line 238, in store_tiles
return [self.create_tile_record(tile_contents) for tile_contents in tile_list]
File "/home/adminprod/agdc-develop/agdc/ingest/dataset_record.py", line 320, in create_tile_record
size_mb=tile_contents.get_output_size_mb(),
File "/home/adminprod/agdc-develop/agdc/ingest/tile_contents.py", line 174, in get_output_size_mb
return get_file_size_mb(path)
File "/home/adminprod/agdc-develop/agdc/cube_util.py", line 109, in get_file_size_mb
return os.path.getsize(path) // (1024 * 1024)
File "/usr/lib/python2.7/genericpath.py", line 49, in getsize
return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER115-035_2011-02-27T01-59-34.560472.tif'
2015-08-04 11:56:02,352 agdc.ingest._core ERROR Unexpected error during path '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER115-035_2011-02-27T01-59-34.560472.tif'
After some investigation, I think the issue is due to the fact the data file is removed in the '__commit' function of the 'collection.py' module: i.e.
To be able to ingest again the updated data source files, I have comment the 'os.remove' instruction above.
Note if the data source have not been updated (i.e. data of the source file = date of the database dataset), there is no issue.
Note If I run again the ingestion, the issue doesn't occur always on the same file: sometimes on the first file, sometimes on the nth file.