OpenDrift / opendrift

Open source framework for ocean trajectory modelling
https://opendrift.github.io
GNU General Public License v2.0
249 stars 120 forks source link

Cannot access CMEMS data #861

Closed John-Luick closed 2 years ago

John-Luick commented 2 years ago

Can someone please attempt access to the global-analysis-forecast-phy-001-024-hourly-merged-uv dataset, and tell me if they also get an error when the reader is trying to set up an xml file? I cannot get an answer from CMEMS support but it would help to at least know if it is just me.

Would also be good to know if this happens frequently with this data.

Thanks

knutfrode commented 2 years ago

The old specialized CMEMS reader which downloaded XML-metadatafiles was an ad-hoc solution since the CMEMS datasets were not available through OPeNDAP. But now they are, and you can (with recent OpenDrift version) use the normal (generic) netCDF reader with the respective URLs.

I see now that that the dataset global-analysis-forecast-phy-001-024-hourly-merged-uv dataset seem to have been recently retired: https://resources.marine.copernicus.eu/product-detail/GLOBAL_ANALYSIS_FORECAST_PHY_001_024/NOTIFICATIONS However, it seems we can use this URL instead for hourly global ocean currents: https://nrt.cmems-du.eu/thredds/dodsC/global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh

These URLs are password protected, so that you must put your CMEMS username and password in a .netrc file http://www.mavetju.org/unix/netrc.php

John-Luick commented 2 years ago

Thanks for the prompt reply. I am using 1.4.2. I will try as you suggest. Hmmm, maybe it was not retired, only renamed. As I was writing that, I got a message from Elena at CMEMS. She said "the dataset global-analysis-forecast-phy-001-024-hourly-merged-uv is now called cmems_mod_glo_phy_anfc_merged-uv_PT1H-i". (As of mid-December, which is weird, because I was using the old name as recently as yesterday. Maybe they allowed it as an alias until last night.)
Probably they will require the netrc now too (I was using the old text file version) so it is good you reminded me. I will play with this tomorrow (it is early evening here in Adelaide) and let you know what I find. Also I will see if there is a more recent OpenDrift than 1.4.2, and if so, will work out how to update. I wonder if that odd warning message it always gives will be gone. Thanks again. John

knutfrode commented 2 years ago

Ok, yes it seems it was just renamed, with a period of allowing parallel names. Then this can be used instead: https://nrt.cmems-du.eu/thredds/dodsC/cmems_mod_glo_phy_anfc_merged-uv_PT1H-i (although global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh seems to go further back in time (1 Jan 2019), but does not include Stokes drift)

The current version of OpenDrift is 1.8.3, and you can update as described here: https://opendrift.github.io/install.html Basically, from main OpenDrift-folder:

John-Luick commented 2 years ago

I wasted more time than I care to admit on the netrc file thing. Finally did some research and learned that in Windows (which I am using), it has to be called "_netrc" not ".netrc"! But I could not find a consistent protocol for where it should be put on Windows, and OpenDrift could not find it, no matter where I put it. So I gave up on that, and tried various iterations of the cmems reader commands. Finally I found that this seems to work for the -ssh data:

om.add_readers_from_list(['{"reader": "reader_cmems", "dataset": "global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh", "cmems_user": "jluick", "cmems_password": ""}', 'https://pae-paha.pacioos.hawaii.edu/thredds/dodsC/ncep_global/NCEP_Global_Atmospheric_Model_best.ncd'])

However, I like the "merged" dataset, because tides are essential to me. Simply substituting in the new name in that JSON string (I guess it called a JSON string) resulted in zero velocities, so I added the variable mapping, which resulted in zero ocean model velocities (only wind drift). This JSON string looked like:

om.add_readers_from_list(['{"reader": "reader_cmems", "dataset": "cmems_mod_glo_phy_anfc_merged-uv_PT1H-i", "variable_mapping"= ("utotal": "x_sea_water_velocity","vtotal": "y_sea_water_velocity"), "cmems_user": "jluick", "cmems_password": ""}', 'https://pae-paha.pacioos.hawaii.edu/thredds/dodsC/ncep_global/NCEP_Global_Atmospheric_Model_best.ncd'])

Maybe I got the mapping syntax wrong? Anyway, I tried some of the other approaches to setting up and adding readers, but nothing was successful,, so I decided that I had reached a point of diminishing return, so am punting it back to the experts. Thanking you in advance. This has been a frustrating day (so close, but no solution), but now I can go enjoy the great outdoors. PS I am still using 1.4.2, not wanting to further complicate this, but will upgrade once I get this sorted.

knutfrode commented 2 years ago

The _netrc should reside in your home folder, according to this thread: https://superuser.com/questions/620143/is-there-a-windows-solution-for-a-program-that-relies-on-netrc I would recommend trying to get that working, as the old solution based on motuclient is very ad hoc and unstable. I am surprised it has worked somehow well. I even remove the reader_cmems in the commit yesterday, so apparently you have not updated OpenDrift?

Anyway, if using old OpenDrift and the cmems-reader, I believe your mapping should have brackets instead of parentheses: "standard_name_mapping"= {"utotal": "x_sea_water_velocity","vtotal": "y_sea_water_velocity"}

John-Luick commented 2 years ago

Yes you are right, I didn't update yet (I think you missed my "PS"). I may as well update next, and start over with a clean slate with the _netrc approach. I doubt I will have time until Monday but will hopefully sort it out and if so will explain how it works here in a few days. The superuser.com link you provided is the same link I was looking at. It is five or ten years old and none of that stuff worked on my Windows with latest OS. That was what I was referring to by "how much time I wasted", (trying to figure out what my Windows calls "HOME").

knutfrode commented 2 years ago

Yes, such things may change over time. You could also try with the regular netCDF reader and putting username and password in the URL: https://<username>:<password>@nrt.cmems-du.eu/thredds/dodsC/global-analysis-forecast-wav-001-027 This does not work for me, but seems to work for some people.

John-Luick commented 2 years ago

I am pretty much out of ideas for reading cmems data online in opendrift. I updated to the latest opendrift, but cannot read cmems data. It gives an authorization error despite my putting a _netrc file (and a .netrc file) where it is supposedly meant to be. The approach suggested of putting the username and password in the regular netcdf reader causes a "malformed" error. I could run opendrift under WSL (Windows linux) but that is sort of a last resort as it inevitably complicates things. For now I will download the cmems data outside of opendrift and read it in opendrift (but that means losing part of the power and original attraction of opendrift). If anyone has any other ideas I am happy to hear them,

The authorization error appears to arise in the netcdf reader, but I couldn't figure out (in the debugger) what the path+filename was that python was looking for (for the info in .netrc). That would have been helpful.

The funny thing was, the first time I installed opendrift, a year or so ago, it all worked beautifully in minutes. Now after using it for months, I've spent literally days unsuccessfully trying to get it to run.

By the way, after updating, I had to reload (pip install) geojson and cmocean. No biggie, but strange. Just thought I'd mention.

knutfrode commented 2 years ago

Debugging _netrc through OpenDrift might be a bit cumbersome, so you could try with the following lines (after installing netrc if you don't have it):

>>> from netrc import netrc
>>> n = netrc()
>>> n.hosts

In my case, this prints all machines in my .netrc along with corresponding usernames and passwords. Maybe in your case it would print some useful error message, e.g. if your _netrc file does not have the right permissions.

Btw, after your new installation of OpenDrift, you should also run pytest to make sure your local installation is not corrputed. If you needed to install cmocean separately, it indicates that you forgot to update the environment after git pull as described above.

John-Luick commented 2 years ago

Thanks for that. With that, when I do print(n) (in python) it correctly displays the machine, username, and password, so I do not know why the netcdf reader still throws an authorization error. But my system for downloading from CMEMS manually, then reading the data from file from within opendrift, is working well, and I will stick with that unless someone comes up with a better idea. Anyway, when doing multiple tests with the same data, it comes to nearly the same thing. Just an extra step.

Something else has come up, that is that when I seed for a small (like 10 metre seed radius) it appears that there is a roundoff error so that the seeds are discretely distributed at four corners of a square. Any chance of adding a couple of decimal places? Even if I set seed_radius=0, seeds all start from discrete patches like 1 km apart. In this case, I am modelling turtle neonates that are released together at the same time and I am only looking at their travel near a coral atoll about 10 km across, and the start point is fairly critical to whether they wash up on one of the other islands.

knutfrode commented 2 years ago

Yes, netrc seems to work, so it is then strange that it does not work with python-netCDF. The following lines work well for me, but not for you?

>>> from opendrift.readers.reader_netCDF_CF_generic import Reader
>>> r = Reader('https://nrt.cmems-du.eu/thredds/dodsC/cmems_mod_glo_phy_anfc_merged-uv_PT1H-i')

Regarding the seeding, I guess this is the same issue as discussed here: https://github.com/OpenDrift/opendrift/discussions/854 Thus I assume you are seeding over time, which means that the actual seeding locations are not shown for all elements, but rather their recorded position at their first output time step. So the seeding is actually correct, it is just not visualized properly. In the near future this will be fixed (by storing the actual seeding locations, not only their position at the output time steps).

John-Luick commented 2 years ago

I think you are right about the seeds, thanks.

For the reader, using your suggestion, I got an authorization error. Updated/simplified: The add_readers_list approach had the same authorization problem.

knutfrode commented 2 years ago

Note that I removed the reader_cmems from OpenDrift at 17th February, as this will probably not work anymore after the CMEMS datasets changed their names, and probably also some related metadata. However, you may download the old reader from here, in case you would like to give it a try to update it to get it working again (with the ad-hoc feature of providing a dictionary including variable_mapping): https://github.com/OpenDrift/opendrift/blob/ca3bd25e46540fc4269bd70fdf071468e8f1155e/opendrift/readers/reader_cmems.py

Regarding the _netrc, you could also double check that password and credentials in the netrc-file are correct, as even a single extra whitespace after a string or a newline may be enough to create problems. The entry should look like this:

machine nrt.cmems-du.eu
    login <your_username>
    password <your password>
John-Luick commented 2 years ago

Thank you for that. I feel the solution is close!

I found that OpenDrift was looking for .netrc after all (not _netrc). And it was looking in C:\Users\User (also known as John Luick under Desktop in my pc). By the way, until I saw your last post, I was calling the machine "cmems", not "nrt.cmems-du.eu". (The instructions on the OpenDrift website are a bit misleading I think.) But after all that, I still got an authorization error. I tried various things with the .netrc file (changed white space, etc. No joy there.

I downloaded reader_cmems, and it works for dataset='global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh', but not for dataset='cmems_mod_glo_phy_anfc_merged-uv_PT1H-i' or dataset='https://nrt.cmems-du.eu/thredds/dodsC/cmems_mod_glo_phy_anfc_merged-uv_PT1H-i'. It says the data is not available, but I know it is available, because I can download it at an anaconda command prompt. Can you please try it in OpenDrift and see if reader_cmems works for you for the merged data? If it works for you, I will post the code block.