USGS-CMG / usgs-cmg-portal

do_convert.sh
6 stars 13 forks source link

Create one step process for converting usgs time series data to CF-1.6 #32

Closed rsignell-usgs closed 9 years ago

rsignell-usgs commented 9 years ago

@kwilcox had been working with the files from http://geoport.whoi.edu/thredds/catalog/usgs/data2/emontgomery/stellwagen/CF-1.6/catalog.html for his portal work, but these files in this directory came from the "raw" files in this directory: http://geoport.whoi.edu/thredds/catalog/usgs/data2/emontgomery/stellwagen/Data/catalog.html converted using this script: http://nbviewer.ipython.org/bc431264273be627a4f5

After discussion today, we decided a better approach would be for @kwilcox to process the files directly from the "raw" directory and simply take some of the metadata and other code as needed from the above script. This way @emontgomery-usgs will be able to add new files to the geoport:/data2/emontgomery/stellwagen/Data directory and have them show up in the portal.

And we will have a one step process to convert "raw" data to CF-1.6 compliant data instead of a two-step process.

emontgomery-usgs commented 9 years ago

I like this solution. Fewer steps is good.

On Tue, Nov 25, 2014 at 5:52 PM, Rich Signell notifications@github.com wrote:

@kwilcox https://github.com/kwilcox had been working with the files from

http://geoport.whoi.edu/thredds/catalog/usgs/data2/emontgomery/stellwagen/CF-1.6/catalog.html for his portal work, but these files in this directory came from the "raw" files in this directory:

http://geoport.whoi.edu/thredds/catalog/usgs/data2/emontgomery/stellwagen/Data/catalog.html converted using this script: http://nbviewer.ipython.org/bc431264273be627a4f5

After discussion today, we decided a better approach would be for @kwilcox https://github.com/kwilcox to process the files directly from the "raw" directory and simply take some of the metadata and other code as needed from the above script. This way @emontgomery-usgs https://github.com/emontgomery-usgs will be able to add new files to the geoport:/data2/emontgomery/stellwagen/Data directory and have them show up in the portal.

And we will have a one step process to convert "raw" data to CF-1.6 compliant data instead of a two-step process.

— Reply to this email directly or view it on GitHub https://github.com/axiomalaska/usgs-cmg-portal/issues/32.

Ellyn Montgomery, Oceanographer and Data Manager U.S. Geological Survey Woods Hole Coastal and Marine Science Center 384 Woods Hole Rd., Woods Hole, MA, 02543-1598 (508) 457-2356

rsignell-usgs commented 9 years ago

After chatting with @emontgomery-usgs today, we realized that we should have a prodecure for Ellyn to upgrade the experiment metadata as well when she adds new netcdf files to the directory. Could she just update a CSV file in that directory also, or is there a better solution?
How could she trigger a refresh that would allow added datasets to appear in the portal? (issue https://github.com/axiomalaska/usgs-cmg-portal/issues/6)

kwilcox commented 9 years ago

@rsignell-usgs Why is a CSV file needed? Can the data just be added to the individual NetCDF files?

rsignell-usgs commented 9 years ago

I'll defer to @emontgomery-usgs. Ellyn?

emontgomery-usgs commented 9 years ago

Our process has been to have one line of csv text as input to augment the metadata for a whole experiment's files as part of the ncml conversion process. This was originally a separate file, but was incorporated into the Ipython notebook,

If we split it back out to a shared file, I'll be able to modify it prior to generating the CF-1.6 for new data.

The extra metadata added at this stage changes our UDDC score from ~20 to ~40, so we need to keep it in the process somewhere.

On Wed, Nov 26, 2014 at 11:28 AM, Rich Signell notifications@github.com wrote:

I'll defer to @emontgomery-usgs https://github.com/emontgomery-usgs. Ellyn?

— Reply to this email directly or view it on GitHub https://github.com/axiomalaska/usgs-cmg-portal/issues/32#issuecomment-64671878 .

Ellyn Montgomery, Oceanographer and Data Manager U.S. Geological Survey Woods Hole Coastal and Marine Science Center 384 Woods Hole Rd., Woods Hole, MA, 02543-1598 (508) 457-2356

kwilcox commented 9 years ago

@rsignell-usgs @emontgomery-usgs Some of the files in the "Data" directory are either corrupt or are not structured correctly to be served through OPeNDAP. I've seen this before when an HDF5 file has some sort of internall link loop (for example: a variable references another variable which references back to the original variable).

You can test by trying to download this particular dataset using ncks:

ncks http://geoport.whoi.edu/thredds/dodsC/usgs/data2/emontgomery/stellwagen/Data/BW2011/9031ysi-a.nc downloaded.nc

That file is only 330Kb but takes me about an hour to download. There are many others (the download process didn't get very far, but many of the YSI files seem to have the problem).

I believe you can solve this on the server by passing each file through NCO. ncks -O in.nc out.nc. Hopefully NCKS can clean the problems up on the rewrite.

emontgomery-usgs commented 9 years ago

Can you access it from here? http://stellwagen.er.usgs.gov/opendap/BW2011/9031ysi-a.nc.html

The thredds OPeNDAP view makes it look like the file is OK. I was able to display some ascii data from it, so wonder what ncks doesn't like.

On Thu, Dec 4, 2014 at 10:06 AM, Kyle Wilcox notifications@github.com wrote:

Some of the files in the "Data" directory are either corrupt or are not structured correctly to be served through OPeNDAP. I've seen this before when an HDF5 file has some sort of internall link loop (for example: a variable references another variable which references back to the original variable).

You can test by trying to download this particular dataset using ncks:

ncks http://geoport.whoi.edu/thredds/dodsC/usgs/data2/emontgomery/stellwagen/Data/BW2011/9031ysi-a.nc downloaded.nc

That file is only 330Kb but takes me about an hour to download. There are many others (the download process didn't get very far, but many of the YSI files seem to have the problem).

I believe you can solve this on the server by passing each file through NCO. ncks -O in.nc out.nc. Hopefully NCKS can clean the problems up on the rewrite.

— Reply to this email directly or view it on GitHub https://github.com/axiomalaska/usgs-cmg-portal/issues/32#issuecomment-65644693 .

Ellyn Montgomery, Oceanographer and Data Manager U.S. Geological Survey Woods Hole Coastal and Marine Science Center 384 Woods Hole Rd., Woods Hole, MA, 02543-1598 (508) 457-2356

kwilcox commented 9 years ago

Access it still slow from the link you posted above

emontgomery-usgs commented 9 years ago

Ideas Rich? This one is an example with a cf_time variable. Could this be confusing OPeNDAP?

On Thu, Dec 4, 2014 at 10:49 AM, Kyle Wilcox notifications@github.com wrote:

Access it still slow from the link you posted above

— Reply to this email directly or view it on GitHub https://github.com/axiomalaska/usgs-cmg-portal/issues/32#issuecomment-65652110 .

Ellyn Montgomery, Oceanographer and Data Manager U.S. Geological Survey Woods Hole Coastal and Marine Science Center 384 Woods Hole Rd., Woods Hole, MA, 02543-1598 (508) 457-2356

rsignell-usgs commented 9 years ago

@kwilcox , I think rather than downloading these files via OPeNDAP, you should use the HTTPS links. For example:

wget http://geoport.whoi.edu/thredds/fileServer/usgs/data2/emontgomery/stellwagen/Data/ARGO_MERCHANT/1211-A1H.cdf

You should be able to modify your thredds catalog crawler to grab all the data via HTTPS, no?

I downloaded the above problem file from our server to the IOOS testbed server in LA in 2 seconds:

wget http://geoport.whoi.edu/thredds/fileServer/usgs/data2/emontgomery/stellwagen/Data/BW2011/9031ysi-a.nc
emontgomery-usgs commented 9 years ago

FYI, I am in the process of copying over two new experiments. Directory names will be CHANDELEUR_13 and DAUPHIN.

An updated list of experiments with which to populate the UDDC attributes in the CF-1.6 files is attached. The commas delineate the fields as it was originally used- the format could be changed is there's a better way to provide the information in your system. The last 2 lines refer to experiments that are not yet available, but should be published within a month or so. Updating this list then running the CF conversion process on newly published experiment data is part of the larger maintenance of the data that needs to be possible.

PS- I tried 3 files of various types from different experiments in the ncks command you sent, and all were very slow to download (on order 2 minutes)- hopefully wget will perform better.

On Thu, Dec 4, 2014 at 11:42 AM, Rich Signell notifications@github.com wrote:

@kwilcox https://github.com/kwilcox , I think rather than downloading these files via OPeNDAP, you should use the HTTPS links. For example: wget http://geoport.whoi.edu/thredds/fileServer/usgs/data2/emontgomery/stellwagen/Data/ARGO_MERCHANT/1211-A1H.cdf

— Reply to this email directly or view it on GitHub https://github.com/axiomalaska/usgs-cmg-portal/issues/32#issuecomment-65661436 .

Ellyn Montgomery, Oceanographer and Data Manager U.S. Geological Survey Woods Hole Coastal and Marine Science Center 384 Woods Hole Rd., Woods Hole, MA, 02543-1598 (508) 457-2356 ARGO_MERCHANT,B. Butman,Argo Merchant Experiment,A moored array deployed after the ARGO MERCHANT ran aground on Nantucket Shoals designed to help understand the fate of the spilled oil. BUZZ_BAY,B. Butman,Currents and Sediment Transport in Buzzards Bay,Investigation of the near-bottom circulation in Buzzards Bay and consequent transport of fine-grained sediments that may be contaminated with PCBs from inner New Bedford Harbor. CAMP,B. Butman,California Area Monitoring Program (CAMP),A four-year multi-disciplinary field and laboratory study to investigate the sediment transport regime in the vicinity of production drilling rigs in the Santa Barbara Basin CAPE_COD_BAY,B. Butman,Currents and Sediment Transport in Cape Cod Bay,A pilot study to determine the effect of winter storms on sediment movement at two potential dredge spoil disposal areas. CC_MISC,B. Butman,Transport studies - Nauset Inlet,Part of a collaborative study of sediment movement in Nauset Inlet. DEEP_REEF,J. Lacey,Gulf of Mexico - Pinnacles,Pressure data from the Gulf of Mexico DWDS_106,B. Butman,Sediment Transport at Deep Water Dump Site 106,Near-bottom current measurements to understand the fate and transport of sludge from the New York Metropolitan region discharged at the sea surface. ECOHAB_II,R. Signell,Ecology of Harmful Algal Blooms (ECOHAB-II),A field program to continue investigating the transport and fate of toxic dinoflagellate blooms in the western Gulf of Maine. ECOHAB_I,R. Signell,Ecology of Harmful Algal Blooms (ECOHAB-I),A field program to study the transport and fate of toxic dinoflagellate blooms in the western Gulf of Maine. EUROSTRATAFORM,C. Sherwood,EuroSTRATAFORM,The EuroSTRATAFORM Po and Apennine Sediment Transport and Accumulation (PASTA) experiment was an international study of sediment-transport processes and formation of geological strata in the Adriatic Sea. FARALLONES,M. Noble,Farallons,Program to measure the currents and circulation on the continental slope off San Francisco CA and thus infer the transport of dredged materialat the newly-established deep-water disposal site. GB_SED,B. Butman,Georges Bank Current and Sediment Transport Studies,A series of studies to assess environmental hazards to petroleum development in the Georges Bank and New England Shelf region GLOBEC_GB,R. Schlitz,GLOBEC Georges Bank Program,A moored array program to investigate the circulation and mixing of plankton on Georges Bank. GLOBEC_GSC,R. Schlitz,GLOBEC Great South Channel Circulation Experiment,A moored array program to investigate the recirculation of water and plankton around Georges Bank GULF_MAINE,B. Butman,Deep Circulation in the Gulf of Maine,A two-year field study to investigate the deep flow between the major basins in the Gulf of Maine and the effects on the distribution of suspended sediments. HUDSON_SVALLEY,B. Butman,Circulation and Sediment Transport in the Hudson Shelf Valley,Field experiments have been carried out to understand the transport of sediments and associated contaminants in the Hudson Shelf Valley offshore of New York. KARIN_RIDGE,M. Noble,Karin Ridge Experiment,Current measurements collected at 2 sites in Karin Ridge Seamount. LYDONIA_C,B. Butman,Lydonia Canyon Dynamics Experiment,A major field experiment to determine the importance of submarine canyons in sediment transport along and across the continental margin. MAB_SED,B. Butman,Sediment Transport Observations in the Middle Atlantic Bight,A series of studies to assess environmental hazards to petroleum development in the Middle Atlantic Bight. MAMALA_BAY,D. Cacchione,Mamala bay Experiment,Current measurements collected at 350-450 meters in Mamala Bay, near Waikiki Beach. MBAY_CIRC,R. Signell,Massachusetts Bay Circulation Experiment,Current measurements collected at 6 sites in Massachusetts Bay throughout the year to map the tidal wind and density driven currents. MBAY_IWAVE,B. Butman,Massachusetts Bay Internal Wave Experiment,A 1-month 4-element moored array experiment to measure the currents associated with large-amplitude internal waves generated by tidal flow across Stellwagen Bank. MBAY_LTB,B. Butman,Long-term observations in Massachusetts Bay; Site B-Scituate,Measurements of currents and other oceanographic properties were made to assess the impact of sewage discharge from the proposed outfall site. MBAY_LT,B. Butman,Long-term observations in Massachusetts Bay; Site A-Boston Harbor,Measurements of currents and other oceanographic properties were made to assess the impact of sewage discharge from the proposed outfall site. MBAY_STELL,R. Signell,Monitoring on Stellwagen Bank,A year-long series of current measurements on the eastern flank of Stellwagen Bank to document the currents at the mouth of Massachusetts Bay driven by the Maine Coastal current. MBAY_WEST,B. Butman,Currents and Sediment Transport in Western Massachusetts Bay,A pilot winter-time experiment to investigate circulation and sediment transport. Designed to provide information to aid in citing the new ocean outfall for the Boston sewer system. MOBILE_BAY,B. Butman,Mobile Bay Study,Measure currents and transport out of Mobile Bay. MONTEREY_BAY,M. Noble,Monterey Bay National Marine Sanctuary Program, Part of a large multi-disciplinary experiment to characterize the geologic environment and to generate a sediment budget. MONTEREY_CAN,M. Noble,Monterey Canyon Experiment, A program to determine the mechanisms that govern the circulation within and the transport of sediment and water through Monterey Submarine Canyon. MYRTLEBEACH,J. Warner,Myrtle Beach Experiment SC,Measurements collected as part of a larger study to understand the physical processes that control the transport of sediments in Long Bay South Carolina. NE_SLOPE,B. Butman,Currents on the New England Continental Slope,A study designed to describe the currents and to investigate the transport of sediment from the shelf to the slope. OCEANOG_C,B. Butman,Oceanographer Canyon Dynamics Experiment,A field experiment to determine the importance of submarine canyons in sediment transport along and across the continental margin. ORANGE_COUNTY,M. Noble,Orange County Sanitation District Studies,Observations to monitor coastal ocean process that transport suspended material and associated comtaminants across the shelf PONCHARTRAIN,R. Signell,Lake Ponchartrain Project,A series of moored array studies to investigate the circulation and particle transport in Lake Pontchartrain. PV_SHELF04,M. Noble,Palos Verdes Shelf 2004,Additional observations to estimate the quantity and direction of sediment erosion and transport on the shelf near the White Point ocean outfalls. PV_SHELF07,M. Noble,Palos Verdes Shelf 2007,Follow-up observations to evaluate how often coastal ocean processes move the DDT contaminated sediments near the White Point ocean outfalls. PV_SHELF,M. Noble,Palos Verdes Shelf Study,Initial observations of currents and circulation near the White Point ocean outfalls determine how often coastal ocean processes move the DDT contaminated sediments in this region. SAB_SED,B. Butman,Sediment Transport Observations in the Southern Atlantic Bight,A series of studies to assess environmental hazards to petroleum development in the South Atlantic Bight. SOUTHERN_CAL,M. Noble,Southern California Project,A series of moorings were deployed to understand how coastal ocean processes that move sediments change with location on the shelf. STRESS,B. Butman,Sediment Transport on Shelves and Slopes (STRESS),Experiment on the California continental margin to investigate storm-driven sediment transport. WRIGHTSVILLE,R. Thieler,Wrightsville Beach Study, Measurements of bottom currents and waves to investigate the flow field and sediment transport in a rippled scour depression offshore of Wrightsville Beach, NC. DIAMONDSHOALS,J. Warner,Cape Hatteras- Diamond Shoals,This experiment was designed to investigate the ocean circulation and sediment transport dynamics at Diamond Shoals NC. CHANDELEUR,C. Sherwood,Chandeleur Islands Oceanographic Measurements,A program to measure waves water levels and currents near the Chandeleur Islands Louisiana and adjacent berm construction site. WFAL,N. Ganju,West Falmouth Harbor Fluxes,Oceanographic and water-quality observations made at six locations in West Falmouth Harbor and Buzzards Bay. BW2011,N. Ganju, Blackwater 2011, Oceanographic and Water-Quality Measurements made at several sites in 2 watersheds in Blackwater National Wildlife Refuge. MVCO_11,C. Sherwood, OASIS MVCO 2011, Near-seabed Oceanographic Observations made as part of the 2011 OASIS Project at the MVCO. hurrIrene_bb,B. Butman, Observations in Buzzards Bay during and after a Hurricane, Oceanographic data collected in Buzzards Bay, MA during Hurricane Irene August 2011. FI12,J. Warner,Fire Island NY - Offshore, Oceangraphic and meterological observations were made at 7 sites on and around the sand ridges offshore of Fire Island NY in winter 2012 to study coastal processes. BARNEGAT,N. Ganju,Light attenuation and sediment resuspension in Barnegat Bay New Jersey, Light attenuation is a critical parameter governing the ecological function of shallow estuaries. near-bottom and mid-water observations of currents, pressure, chlorophyll, and fDOM were collected at three pairs of sites sequentially at different locations in the estuary to characterize the conditions. CHANDELEUR_13,C. Sherwood,BIER Water-Level and Wave Measurements- Chandeleur Island Louisiana , Relatively inexpensive pressure sensors were deployed in a rapidly evolving barrier island system to directly measure inundation levels during seasonal and storm events. DAUPHIN,C. Sherwood, BIER Water-Level Measurements- Dauphin Island Alabama, Relatively inexpensive pressure sensors were deployed in a rapidly evolving barrier island system to directly measure inundation levels during seasonal and storm events. FI14,J. Warner,Fire Island NY - Nearshore, Oceangraphic and meterological observations were made at 9 sites on and around the sand ridges offshore of Fire Island NY in winter 2014 to study coastal processes. RCNWR,N. Ganju, Rachel Carson NWR 2013, Oceanographic and water-quality observations made at 3 sites in the wetlands of the Rachel Carson National Wildlife Refuge.

kwilcox commented 9 years ago

@rsignell-usgs Using the HTTP endpoint will not pick up any changes you make to the files using NcML going foward. If you are OK with that I can do it, but it sounds like some metadata changes need to happen server side.

rsignell-usgs commented 9 years ago

@kwilcox, those files have no NcML additions. They are just netCDF files dropped into that directory and picked up by datasetScan.

kwilcox commented 9 years ago

https://github.com/axiomalaska/usgs-cmg-portal/tree/master/woods_hole_obs_data

rsignell-usgs commented 9 years ago

We finally ran this and we were somewhat surprised by the result. I tried running:

 python collect.py --download --output=./tmp --projects EUROSTRATAFORM

and this turned these 4 datasets:

7011adc-a.nc  7021mc-a.nc  7031adc-a.nc  7032mc-a.nc

into 21 directories each with 1 file:

./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_702:sea_water_practical_salinity/S_40_2002-11-08T12:50Z_TO_2003-02-16T13:02Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_703:sea_water_temperature/Tx_1211_2002-11-08T08:47Z_TO_2003-02-16T15:47Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_703:sea_surface_height/hght_18_2002-11-08T08:47Z_TO_2003-02-16T15:47Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_703:upward_sea_water_velocity/w_1204_2002-11-08T08:47Z_TO_2003-02-16T15:47Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_702:sea_water_pressure/P_1_2002-11-08T12:50Z_TO_2003-02-16T13:02Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_702:air_temperature/T_20_2002-11-08T12:50Z_TO_2003-02-16T13:02Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_702:sea_water_sigma_theta/STH_71_2002-11-08T12:50Z_TO_2003-02-16T13:02Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_703:sea_water_electrical_conductivity/C_51_2002-11-08T08:37Z_TO_2003-02-16T14:36Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_703:direction_of_sea_water_velocity/dir_310_2002-11-08T08:47Z_TO_2003-02-16T15:47Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_703:air_temperature/T_20_2002-11-08T08:37Z_TO_2003-02-16T14:36Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_701:sea_water_pressure/P_4_2002-11-08T13:17Z_TO_2003-02-14T03:32Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_703:sea_water_speed/spd_300_2002-11-08T08:47Z_TO_2003-02-16T15:47Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_701:sea_water_speed/spd_300_2002-11-08T13:17Z_TO_2003-02-14T03:32Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_701:sea_water_temperature/Tx_1211_2002-11-08T13:17Z_TO_2003-02-14T03:32Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_701:upward_sea_water_velocity/w_1204_2002-11-08T13:17Z_TO_2003-02-14T03:32Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_701:direction_of_sea_water_velocity/dir_310_2002-11-08T13:17Z_TO_2003-02-14T03:32Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_703:sea_water_practical_salinity/S_40_2002-11-08T08:37Z_TO_2003-02-16T14:36Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_702:sea_water_electrical_conductivity/C_51_2002-11-08T12:50Z_TO_2003-02-16T13:02Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_703:sea_water_sigma_theta/STH_71_2002-11-08T08:37Z_TO_2003-02-16T14:36Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_701:sea_surface_height/hght_18_2002-11-08T13:17Z_TO_2003-02-14T03:32Z.nc
./urn:ioos:sensor:gov.usgs.cmgp:eurostrataform_703:sea_water_pressure/P_4_2002-11-08T08:47Z_TO_2003-02-16T15:47Z.nc

I gather this is a form convenient for the Axiom stack, but how hard would it be to use this code to write CF-1.6 versions of our existing files?

rsignell-usgs commented 9 years ago

@kwilcox , would it be hard to create use your work here to create python code to do a one-to-one conversion of our netcdf files into CF-1.6 compliant netcdf files?

kwilcox commented 9 years ago

@rsignell-usgs @emontgomery-usgs I've made the changes so you can generate the CF1.6 files. See the README for an additional command line argument "cf16. You should be able to do this now:

python collect.py --download --projects EUROSTRATAFORM --output=./tmp cf16

rsignell-usgs commented 9 years ago

It first failed because epic2py was not installed.
After epic2py installation, I'm getting:

rsignell@gam:/usgs/data2/emontgomery/stellwagen/usgs-cmg-portal/woods_hole_obs_data$ python collect.py --download --projects EUROSTRATAFORM --output=./tmp cf16
2015-01-20 13:52:32 gam root[28344] INFO {'EUROSTRATAFORM': {'catalog_xml': 'http://geoport.whoi.edu/thredds/catalog/usgs/data2/emontgomery/stellwagen/Data/EUROSTRATAFORM/catalog.xml', 'project_summary': 'The EuroSTRATAFORM Po and Apennine Sediment Transport and Accumulation (PASTA) experiment was an international study of sediment-transport processes and formation of geological strata in the Adriatic Sea.', 'project_name': 'EUROSTRATAFORM', 'project_title': 'EuroSTRATAFORM', 'contributor_name': 'C. Sherwood'}}
2015-01-20 13:52:32 gam thredds_crawler[28344] INFO Crawling: http://geoport.whoi.edu/thredds/catalog/usgs/data2/emontgomery/stellwagen/Data/EUROSTRATAFORM/catalog.xml
2015-01-20 13:52:33 gam root[28344] INFO Found 4 datasets in EUROSTRATAFORM!
2015-01-20 13:52:33 gam root[28344] INFO Found 4 TOTAL datasets!
2015-01-20 13:52:33 gam root[28344] INFO Downloading http://geoport.whoi.edu/thredds/fileServer/usgs/data2/emontgomery/stellwagen/Data/EUROSTRATAFORM/7011adc-a.nc
Traceback (most recent call last):
  File "collect.py", line 858, in <module>
    main(args.output, args.download, args.format, projects, os.path.realpath(args.csv_metadata_file))
  File "collect.py", line 406, in main
    downloaded_files = download(download_folder, project_metadata)
  File "collect.py", line 232, in download
    logger.info("{!s} saved ({!s}/{!s})".format(num + 1, len(total_datasets)))
IndexError: tuple index out of range
kwilcox commented 9 years ago

ugh. I committed a few lines I shouldn't have... one sec.

kwilcox commented 9 years ago

@rsignell-usgs pull and give it another shot

rsignell-usgs commented 9 years ago

@kwilcox , now working! Awesome! Just one more little thing: it seems a bit silly to download the data since we are running this script on the machine where the data is local. I was thinking of just omitting the --download option, and writing a script that softlinks the input directory to ./download but perhaps it would be easy to allow this option in the code?

If this is hard, don't bother. It's not that big a deal downloading the data even if it's local.

kwilcox commented 9 years ago

@rsignell-usgs Just pushed a new command line option --folder that allows you to specify the folder to download files into. If you use this option and omit the --download option, it will scan the folder for the files to translate. See README.md for more details!

rsignell-usgs commented 9 years ago

Seems like I have the syntax a bit wrong? Does it matter if there are = in the specification of input parameters?

 $ls /usgs/data2/emontgomery/stellwagen/Data/EUROSTRATAFORM
7011adc-a.nc  7021mc-a.nc  7031adc-a.nc  7032mc-a.nc

$ python collect.py --folder=/usgs/data2/emontgomery/stellwagen/Data/EUROSTRATAFORM --projects=EUROSTRATAFORM --output=./tmp2 cf16

rsignell@gam:/usgs/data2/emontgomery/stellwagen/usgs-cmg-portal/woods_hole_obs_data$ python collect.py --folder=/usgs/data2/emontgomery/stellwagen/Data/EUROSTRATAFORM --projects=EUROSTRATAFORM --output=./tmp2 cf16
Traceback (most recent call last):
  File "collect.py", line 858, in <module>
    main(args.output, args.folder, args.download, args.format, projects, os.path.realpath(args.csv_metadata_file))
  File "collect.py", line 422, in main
    project_name, _ = tmpnc.id.split("/")
  File "netCDF4.pyx", line 1939, in netCDF4.Dataset.__getattr__ (netCDF4.c:24474)
  File "netCDF4.pyx", line 1884, in netCDF4.Dataset.getncattr (netCDF4.c:23689)
  File "netCDF4.pyx", line 937, in netCDF4._get_att (netCDF4.c:14866)
AttributeError: NetCDF: Attribute not found
kwilcox commented 9 years ago

I don't have a way to test this on your machine, all I have access to are the files through THREDDS. It is working as intended for me:

$ python collect.py --download --folder=/tmp/euro --projects=EUROSTRATAFORM --output=./tmp2 cf16
2015-01-21 11:42:57 terrapin thredds_crawler[20285] INFO Crawling: http://geoport.whoi.edu/thredds/catalog/usgs/data2/emontgomery/stellwagen/Data/EUROSTRATAFORM/catalog.xml
2015-01-21 11:42:58 terrapin root[20285] INFO Found 4 datasets in EUROSTRATAFORM!
2015-01-21 11:42:58 terrapin root[20285] INFO Found 4 TOTAL datasets!
2015-01-21 11:42:58 terrapin root[20285] INFO Downloading http://geoport.whoi.edu/thredds/fileServer/usgs/data2/emontgomery/stellwagen/Data/EUROSTRATAFORM/7011adc-a.nc
2015-01-21 11:42:59 terrapin root[20285] INFO 7011adc-a.nc saved (1/4)
2015-01-21 11:42:59 terrapin root[20285] INFO Downloading http://geoport.whoi.edu/thredds/fileServer/usgs/data2/emontgomery/stellwagen/Data/EUROSTRATAFORM/7021mc-a.nc
2015-01-21 11:43:01 terrapin root[20285] INFO 7021mc-a.nc saved (2/4)
2015-01-21 11:43:01 terrapin root[20285] INFO Downloading http://geoport.whoi.edu/thredds/fileServer/usgs/data2/emontgomery/stellwagen/Data/EUROSTRATAFORM/7031adc-a.nc
2015-01-21 11:43:03 terrapin root[20285] INFO 7031adc-a.nc saved (3/4)
2015-01-21 11:43:03 terrapin root[20285] INFO Downloading http://geoport.whoi.edu/thredds/fileServer/usgs/data2/emontgomery/stellwagen/Data/EUROSTRATAFORM/7032mc-a.nc
2015-01-21 11:43:04 terrapin root[20285] INFO 7032mc-a.nc saved (4/4)
2015-01-21 11:43:04 terrapin root[20285] INFO FILE: /tmp/euro/7011adc-a.nc
2015-01-21 11:43:04 terrapin root[20285] INFO Copying u_1205 into u_1205
2015-01-21 11:43:04 terrapin root[20285] INFO Copying v_1206 into v_1206
2015-01-21 11:43:04 terrapin root[20285] INFO Copying w_1204 into w_1204
2015-01-21 11:43:04 terrapin root[20285] INFO Copying Werr_1201 into Werr_1201
2015-01-21 11:43:04 terrapin root[20285] INFO Copying AGC_1202 into AGC_1202
2015-01-21 11:43:04 terrapin root[20285] INFO Copying PGd_1203 into PGd_1203
2015-01-21 11:43:04 terrapin root[20285] INFO Copying hght_18 into hght_18
2015-01-21 11:43:04 terrapin root[20285] INFO Copying Tx_1211 into Tx_1211
2015-01-21 11:43:04 terrapin root[20285] INFO Copying P_4 into P_4
2015-01-21 11:43:04 terrapin root[20285] INFO Copying time
2015-01-21 11:43:06 terrapin root[20285] INFO FILE: /tmp/euro/7021mc-a.nc
2015-01-21 11:43:06 terrapin root[20285] INFO Copying T_20 into T_20
2015-01-21 11:43:06 terrapin root[20285] INFO Copying C_51 into C_51
2015-01-21 11:43:06 terrapin root[20285] INFO Copying P_1 into P_1
2015-01-21 11:43:06 terrapin root[20285] INFO Copying S_40 into S_40
2015-01-21 11:43:06 terrapin root[20285] INFO Copying STH_71 into STH_71
2015-01-21 11:43:06 terrapin root[20285] INFO Copying time
2015-01-21 11:43:06 terrapin root[20285] INFO FILE: /tmp/euro/7031adc-a.nc
2015-01-21 11:43:06 terrapin root[20285] INFO Copying u_1205 into u_1205
2015-01-21 11:43:06 terrapin root[20285] INFO Copying v_1206 into v_1206
2015-01-21 11:43:06 terrapin root[20285] INFO Copying w_1204 into w_1204
2015-01-21 11:43:06 terrapin root[20285] INFO Copying Werr_1201 into Werr_1201
2015-01-21 11:43:07 terrapin root[20285] INFO Copying AGC_1202 into AGC_1202
2015-01-21 11:43:07 terrapin root[20285] INFO Copying PGd_1203 into PGd_1203
2015-01-21 11:43:07 terrapin root[20285] INFO Copying hght_18 into hght_18
2015-01-21 11:43:07 terrapin root[20285] INFO Copying Tx_1211 into Tx_1211
2015-01-21 11:43:07 terrapin root[20285] INFO Copying P_4 into P_4
2015-01-21 11:43:07 terrapin root[20285] INFO Copying time
2015-01-21 11:43:08 terrapin root[20285] INFO FILE: /tmp/euro/7032mc-a.nc
2015-01-21 11:43:08 terrapin root[20285] INFO Copying T_20 into T_20
2015-01-21 11:43:08 terrapin root[20285] INFO Copying C_51 into C_51
2015-01-21 11:43:08 terrapin root[20285] INFO Copying S_40 into S_40
2015-01-21 11:43:08 terrapin root[20285] INFO Copying STH_71 into STH_71
2015-01-21 11:43:08 terrapin root[20285] INFO Copying time

$ python collect.py --folder=/tmp/euro --projects=EUROSTRATAFORM --output=./tmp2 cf16
2015-01-21 11:43:13 terrapin root[20328] INFO FILE: /tmp/euro/7032mc-a.nc
2015-01-21 11:43:13 terrapin root[20328] INFO Copying T_20 into T_20
2015-01-21 11:43:13 terrapin root[20328] INFO Copying C_51 into C_51
2015-01-21 11:43:13 terrapin root[20328] INFO Copying S_40 into S_40
2015-01-21 11:43:13 terrapin root[20328] INFO Copying STH_71 into STH_71
2015-01-21 11:43:14 terrapin root[20328] INFO Copying time
2015-01-21 11:43:14 terrapin root[20328] INFO FILE: /tmp/euro/7011adc-a.nc
2015-01-21 11:43:14 terrapin root[20328] INFO Copying u_1205 into u_1205
2015-01-21 11:43:14 terrapin root[20328] INFO Copying v_1206 into v_1206
2015-01-21 11:43:14 terrapin root[20328] INFO Copying w_1204 into w_1204
2015-01-21 11:43:14 terrapin root[20328] INFO Copying Werr_1201 into Werr_1201
2015-01-21 11:43:14 terrapin root[20328] INFO Copying AGC_1202 into AGC_1202
2015-01-21 11:43:14 terrapin root[20328] INFO Copying PGd_1203 into PGd_1203
2015-01-21 11:43:14 terrapin root[20328] INFO Copying hght_18 into hght_18
2015-01-21 11:43:14 terrapin root[20328] INFO Copying Tx_1211 into Tx_1211
2015-01-21 11:43:14 terrapin root[20328] INFO Copying P_4 into P_4
2015-01-21 11:43:14 terrapin root[20328] INFO Copying time
2015-01-21 11:43:15 terrapin root[20328] INFO FILE: /tmp/euro/7021mc-a.nc
2015-01-21 11:43:15 terrapin root[20328] INFO Copying T_20 into T_20
2015-01-21 11:43:15 terrapin root[20328] INFO Copying C_51 into C_51
2015-01-21 11:43:15 terrapin root[20328] INFO Copying P_1 into P_1
2015-01-21 11:43:15 terrapin root[20328] INFO Copying S_40 into S_40
2015-01-21 11:43:15 terrapin root[20328] INFO Copying STH_71 into STH_71
2015-01-21 11:43:15 terrapin root[20328] INFO Copying time
2015-01-21 11:43:16 terrapin root[20328] INFO FILE: /tmp/euro/7031adc-a.nc
2015-01-21 11:43:16 terrapin root[20328] INFO Copying u_1205 into u_1205
2015-01-21 11:43:16 terrapin root[20328] INFO Copying v_1206 into v_1206
2015-01-21 11:43:16 terrapin root[20328] INFO Copying w_1204 into w_1204
2015-01-21 11:43:16 terrapin root[20328] INFO Copying Werr_1201 into Werr_1201
2015-01-21 11:43:16 terrapin root[20328] INFO Copying AGC_1202 into AGC_1202
2015-01-21 11:43:16 terrapin root[20328] INFO Copying PGd_1203 into PGd_1203
2015-01-21 11:43:16 terrapin root[20328] INFO Copying hght_18 into hght_18
2015-01-21 11:43:16 terrapin root[20328] INFO Copying Tx_1211 into Tx_1211
2015-01-21 11:43:16 terrapin root[20328] INFO Copying P_4 into P_4
2015-01-21 11:43:16 terrapin root[20328] INFO Copying time
rsignell-usgs commented 9 years ago

I gchatted with @kwilcox yesterday, and we figured out that to use local files instead of downloading, he needs to identify an experiment with a set of files, so we left it as it for the time being. The download doesn't take long compared to the conversion anyway. So I wrote a small bash script to convert our files:

cd /usgs/data2/emontgomery/stellwagen/usgs-cmg-portal/woods_hole_obs_data
more do_convert.sh
#!/bin/bash
for proj in /usgs/data2/emontgomery/stellwagen/Data/*
do
   project=`echo $proj | sed -e 's;/usgs/data2/emontgomery/stellwagen/Data/;;'`
   echo $project
   python collect.py --download --projects $project \
        --output=/usgs/data2/emontgomery/stellwagen/CF-1.6 cf16
done
daf commented 9 years ago

Rich, use basename for this kind of substitution. (and its cousin, dirname if needed)

... this episode of bikeshedding with Dave brought to you by the letter P and the number 9.

rsignell-usgs commented 9 years ago

@daf: okay you popped your head up on this. What's your better version of this script?

rsignell-usgs commented 9 years ago

@daf, is this it?:

#!/bin/bash
for proj in /usgs/data2/emontgomery/stellwagen/Data/*
do
   project=$(basename "$proj")
   echo $project
   python collect.py --download --projects $project \
        --output=/usgs/data2/emontgomery/stellwagen/CF-1.6 cf16
done
daf commented 9 years ago

That's what I meant, it's more succinct. (i realize I'm being entirely unhelpful here, hence the joke about bikeshedding)

kwilcox commented 9 years ago

Can you just leave off the --projects option and do them all?

emontgomery-usgs commented 9 years ago

folks-

I've been through this thread and thought that I should be able to modify then run Rich's do_convert.sh, and It would download the (FI14 in this case) data then convert the files and put the output in the CF-1.6 directory.

I have /home/usgs/anaconda/bin in my path, but get a missing epic2cf when I try to run the script.

$ sh do_convert.sh FI14 Traceback (most recent call last): File "collect.py", line 13, in import epic2cf ImportError: No module named epic2cf

What might I be missing in my path?

Thanks! Ellyn

On Fri, Jan 23, 2015 at 1:17 PM, Kyle Wilcox notifications@github.com wrote:

Can you just leave off the --projects option and do them all?

— Reply to this email directly or view it on GitHub https://github.com/axiom-data-science/usgs-cmg-portal/issues/32#issuecomment-71237876 .

Ellyn Montgomery, Oceanographer and Data Manager U.S. Geological Survey Woods Hole Coastal and Marine Science Center 384 Woods Hole Rd., Woods Hole, MA, 02543-1598 (508) 457-2356