climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

World Ocean Database (a.k.a. World Ocean Atlas?) #323

Open JeremiahCurtis opened 7 years ago

JeremiahCurtis commented 7 years ago

ftp://ftp.nodc.noaa.gov/pub/ a bunch of directories beginning in WOD and WOA (e.g., WOA01.OLD, WOD98, etc.)

not sure if the ftp://ftp.nodc.noaa.gov/pub/woa/ folder replicates the data contained in the aforementioned directories or not

no tickets/issues here on github appear to include these data at all

MORE INFO: https://www.nodc.noaa.gov/OC5/indprod.html (as far as I can tell, there might be more data here, especially under "International Ocean Atlas and Information Series", that is almost certainly not on the ftp site

provided by OCL https://www.nodc.noaa.gov/access/oceanclimate.html

"The Ocean Climate Laboratory Team (OCL) is a team within the National Centers for Environmental Information (NCEI). The primary objectives of the OCL are to:

Improve the quality of the NCEI's oceanographic data archives by using the data to perform scientific analyses
Develop improved ocean climatologies for annual, seasonal, and monthly compositing periods
Investigate interannual-to-decadal ocean climate variability using historical oceanographic data
Build scientifically, quality-controlled global oceanographic databases
Facilitate international exchange of oceanographic data

The OCL includes World Data Center for Oceanography, Silver Spring (WDC). Operated under the auspices of the U.S. Academy of Sciences, WDC is one of the U.S. discipline subcenters within the World Data Center system. There are two other World Data Centers for Oceanography: World Data Center for Oceanography in Obninsk (Russia), and World Data Center for Oceanography in Tianjin (People's Republic of China).

The OCL directs the international Global Oceanographic Data Archaeology and Rescue (GODAR) Project. Initiated by the NODC and WDC, this project was subsequently endorsed by the Intergovernmental Oceanographic Commission. The GODAR Project has resulted in an increase of over eight million historical ocean temperature casts, 200,000 plankton casts, as well as many other data.

All data are made available internationally without restriction consistent with the data policy of World Data System."

nonspecialist commented 7 years ago

ftp://ftp.nodc.noaa.gov/pub/WO[AD]* are symlinks to ftp://ftp.nodc.noaa.gov/pub/woa/... so it's not replicated, and only the pub/woa directories would need to be grabbed from that part of the tree.

There's a lot of stuff under pub/data.nodc almost all of which symlinks back to files under /nodc/archive -- but the directory structure under /nodc/archive is slightly different, including different case of file/directory names, etc. This seems to imply that both parts will be needed, /pub/data.nodc for structure and /nodc/archive for the actual content

I'm crawling bit by bit to estimate the size ... could take a while.

A high-level overview of relevant parts looks like:

pub
  +-- NCEP          0 KB
  +-- data.nodc
    +-- Aquarius    3 GB
    +-- DeepwaterHorizon 2GB
    +-- GCOS        3 GB
    +-- SMOS        10 GB
    +-- arcgis      0 KB
    +-- argo        140 GB
    +-- coaps       0 KB
    +-- coris       84 GB
    +-- cortad      0 KB
    +-- crw         75 MB
    +-- geoportal   0 KB
    +-- ghrsst      89 GB
    +-- glider      0 KB
    +-- goc         3 GB
    +-- gtspp       400 GB
    +-- iode        1 GB
    +-- ioos        0 KB
    +-- jason2      70 GB
    +-- jason3      830 GB
    +-- jogata      36 MB
    +-- kod         0 KB
    +-- las         0 KB
    +-- livnehmodel 0 KB
    +-- lsa         0 KB
    +-- ncei        8 MB
    +-- ncep        0 KB
    +-- ndbc        2 MB
    +-- nmsp        5 GB
    +-- nodc        319 GB
    +-- ocs         0 KB
    +-- oer         1.5 TB
    +-- opendap     0 KB
    +-- pathfinder  1 TB
    +-- sar-winds   0 KB
    +-- sohcs       2 MB
    +-- threads     0 KB
    +-- w3c         0 KB
    +-- woce        12 GB
    +-- wod         0 KB
    +-- xsl         8 MB
  +-- dist          0 KB
  +-- f291
  +-- incoming      25 GB
  +-- nmsp          0 KB
  +-- outgoing      24 GB
  +-- pathfinder    92 GB
  +-- utils         0 KB
  +-- woa
    +-- ACCINFO     705 MB
    +-- ANOMALY     0 KB
    +-- AZOV2006    171 MB
    +-- AZOV2008    146 MB
    +-- BARPLANK    8 MB
    +-- CRINFO
    +-- CZCS        2 MB
    +-- DATA_ANALYSIS 121 GB
    +-- HISTORY     1 GB
    +-- OKHOTSK2001 1 MB
    +-- PACIFIC09   3 GB
    +-- PROGRAMS    1 MB
    +-- PUBLICATIONS 620 MB
    +-- REGCLIM     95 GB
    +-- WH_SEA      11 MB
    +-- WOA01
    +-- WOA01F
    +-- WOA05
    +-- WOA05F
    +-- WOA05nc
    +-- WOA09
    +-- WOA09F
    +-- WOA13
    +-- WOA13F
    +-- WOA13Fv2
    +-- WOA94
    +-- WOA98
    +-- WOD
    +-- WOD01
    +-- WOD05
    +-- WOD09
    +-- WOD13
    +-- WOD98
    +-- checksums
nodc
  +-- archive
    +-- arc0001
    ...
    +-- arc0105
  +-- data
    +-- WAVE_0        0 KB
    +-- jason2-xgdr   7 TB
    +-- migration     0 KB
    +-- oc1.argo
    +-- oc1.gtspp
gabefair commented 7 years ago

@nickrsan any suggestions on how to approach this? The server has lots of hardlinks to files in other places.

gabefair commented 7 years ago

Probably our best bet is to break this one up into smaller tickets. See #337 for inspiration on how I did it for other supersets