coecms / ARCCSSive

ARCCSS Data Access Tools
Apache License 2.0
3 stars 1 forks source link

Inconsistent search results #12

Closed DamienIrving closed 8 years ago

DamienIrving commented 8 years ago

Sorry to be a pest with another issue, but here is an example of inconsistent search results that I've come across.

I know that this file is on the system:

/g/data/ua6/drstree/CMIP5/GCM/NOAA-GFDL/GFDL-CM3/historicalMisc/mon/ocean/thetao/r1i1p1/thetao_Omon_GFDL-CM3_historicalMisc_r1i1p1_186001-186412.nc

and when I search for it at the command line

$ python examples/search_cmip5.py -v thetao -e historicalMisc -t Omon --model GFDL-CM3

or within Python

from ARCCSSive import CMIP5
cmip5 = CMIP5.connect()
cmip5.outputs(experiment = 'historicalMisc', variable = 'thetao', mip = 'Omon', model = 'GFDL-CM3')

the following file is correctly identified:

/g/data/ua6/unofficial-ESG-replica/tmp/tree/esgdata.gfdl.noaa.gov/thredds/fileServer/gfdl_dataroot/NOAA-GFDL/GFDL-CM3/historicalMisc/mon/ocean/Omon/r1i1p1/v20110601/thetao/thetao_Omon_GFDL-CM3_historicalMisc_r1i1p1_186001-186412.nc

In contrast, I know that this file is on the system:

/g/data/ua6/drstree/CMIP5/GCM/CCCMA/CanESM2/historicalMisc/mon/ocean/thetao/r1i1p2/thetao_Omon_CanESM2_historicalMisc_r1i1p2_185001-186012.nc

but neither command line or within Python searches can identify it.

Version Information

import ARCCSSive
ARCCSSive.debug.info
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-cbb8f46ae7fc> in <module>()
----> 1 ARCCSSive.debug.info

AttributeError: 'module' object has no attribute 'debug'
paolap commented 8 years ago

Hi Damien,

this is not an error, if you solve the link you'll see the real path is in the NCI_replica_tmp directory

/g/data/ua6/NCI_replica_tmp/ua6_sync/sync/LLNL/css02-cmip5/data/cmip5/output1/CCCma/CanESM2/historicalMisc/mon/ocean/Omon/r1i1p2/thetao/1/thetao_Omon_CanESM2_historicalMisc_r1i1p2_185001-186012.nc

Anything currently in the database is from unofficial, not necessarily from drstree, that is because as far as I was told some links are broken on drstree (that might as well being fixed now) and drstree was going to change structure during which upgrade we would have rebuilt the database. Tim run out of time so that never happened. Now I didn't even know they added some(?) all(?) the bulk replica to drstree. I have a real issue with that since a lot of that is duplicates and it's hard to tell since the versioning it's inconsistent, for example LLNL has replace the model publisher version with "1" as you can see in the example. Then Tim does a version estimate if he can't find version info in the file, when it builds the drstree. So up to now I kept the bulk transfer out of the database to avoid to double up its size for no good reason. Really I was also waiting on NCI to solve this. I was just starting this morning to look into ways of adding this data after a proper check and assign wherever possible the right version, because it is an issue potentially downloading data which is already available on raijin. I'll be testing adding data to new_cmip5_replica_test2.db so the database you've used up to now, I've made a copy cmip5_replica.db which will be the official version for the new release until I'm sure the bulk data is added properly with no duplicates.

So the results aren't "inconsistent" rather the dataset organisation is.

DamienIrving commented 8 years ago

@paolap Ah, cool. Thanks for explaining. When I come to put together my final ensemble for analysis (which won't be for weeks/months) I might touch base with you to make sure I've located everything that is available on NCI (and elsewhere).

paolap commented 8 years ago

Hopefully I'll have them added properly before going away at the end of the month, so sometimes next week. Maybe Claire identified the duplicates already which would make it easier. I'll let you know when this is done