Closed mevo-creator closed 3 years ago
ok...i understand that the performance will drop gradually.
I understand the point behind seeding particles over a longer duration because of temporal variation in currents and wind conditions. Is it advisable to seed it over maybe 2/3 years? Because if i seed it over the entire 5 years, the particles that are seeded late in time will not drift that far to see if there is any accumulation. But since overall particle size is quite big perhaps this won't make any difference? Please suggest the best alternative then I'll just seed over the entire period?
Yes, seeding over e.g. the first 2 years might be a good idea. Then all particles will be followed for at least 3 years, and you have the possibility to investigate to which degree the end location depends on the seeding time. I would also consider doubling the amount of particles (to 200.000), allowing better statistics when subdividing them afterwards. As mentioned above, it is now possible to import/analyse a subset of the particles to avoid memory problems:
>>> o = opendrift.open('huge_file.nc', elements=np.arange(0, 10000)) # to read only the first 10.000 elements
Alright, I'll double the particles and seed them over 2 years and track for 5 years from 2009. Yes, the new feature will be quite useful in importing a subset of particles. Hopefully, the simulations complete in a first try, I'll keep you updated. Thanks so much for all your guidance, appreciate your help. God Helg!
Good luck and god helg!
Hi, I started the simulation on 18th of this month and as of today (after about 2 weeks), about 64000 particles out of 200000 are yet to be seeded and 23741 steps out of 87648 have been completed. It seems like, with the number of particles still scheduled to be released, the simulation might take next 4 weeks or more to complete.
Does this sound normal? Can we expedite the process by changing some configurations or using online servers or by some other alternatives? Please let me know your opinion.
Yes, this is of course a very long time, but it sounds normal. If you have a look at the last few time steps of your log file, you will see what takes the most of this time. I will assume the bottleneck is simply obtaining ocean currents from the SVIM model, through thredds. There is simply a large amount of data to be downloaded, as this is a 3D simulation over a very large area. Having the data available locally in house would about double the speed, but it is anyway a long simulation.
The time needed, should be about proportional to the length of the simulation, so one measure would be to reduce that.
Another measure would be to read only the upper few layers of the ocean model, though sacrificing some accuracy. I would believe (as mentioned above), that truncating at 3m would be ok for ths purpose:
o.set_config('drift:truncate_ocean_model_below_m', 3)
You might even use just the surface values, by truncating at z=0.
You could also double the calculation time step from 30 minutes to 1 hour or even 2-3 hours, which should be ok at this scale, and which will also provide some speedup. Reducing number of particles will also have some effect, but not too much, so I would keep the 200.000.
If you paste the last two time steps of your log, I can help analyze it for bottlenecks.
Thank you for suggesting alternatives. I am attaching a text file Simulation.txt where I've pasted log from last few time steps. It seems like interpolating is taking much of a time. If possible, I would like to keep the length of the simulation the same.
I guess there will be huge amount of data that needs to be downloaded and merged into a single file for 5 years simulation if we plan to use local disk for reading the data. Is this a viable option?
I'm not sure if my understanding is correct about truncating model below a certain depth. If we truncate model below 3 m than the position of particles would not be calculated below 3 m and thereby would not get tracked by the model or the model will interpolate the data from the upper 3 m for the entire water column and still track the particles at higher depths than 3 m. I need to track particles below 3 m as well. However, I'm ok with sacrificing some accuracy if it is possible to track particles for certain depth where the majority of particles could be found.
Doubling or making the time step 2-3 hours seems like an best alternative for speeding up the process. However, I wonder should I stop the current simulation and start again by increasing time steps along with perhaps truncating the model ? or continue with this simulation. the log file has become almost 950 MB and It takes some time to open the log file and overall system has also become quite slow to respond :) I wonder if it is advisable to keep running this simulation for next 4 weeks and more and expect successful completion of the simulation.
Your logs show that the last time steps have taken nearly 10 minutes each, meaning that it would take a full year to complete the simulation(!) So it must have went much faster in the beginning.
I see that the full column is downloaded (28 layers), meaning an enormous amount of data at each time step. But also the interpolation and other processes are taking very long time here, perhaps since disk swapping is used due to lack of memory(?)
So I would stop this simulation. But truncating e.g. at 3m should then make a big impact. This means that ocean currents at 3m depth are interpolated towards the seafloor, so you will get drift in any case. For this purpose it is not as bad as it sounds, as the vertical shear of horizontal currents is by far largest near the surface. I think this is necessary to be able to complete this simulation. Downloading the SVIM locally is not an alternative, as it is simply too much data (tens of terabytes).
I would also use 3h calculation time step, as also the computations seem to take some time here. Comparable simulations are much faster on my computer (Linux), but I am not familiar with running such simulations on Windows, and do not know any Windows-specific performance tips.
From the last couple of days, I suspected something unusual as the time step was progressing very slowly. I've 16 GB of RAM in the system, not quite sure what exactly is disk swapping, do I need to do something to address this?
I understand the concept behind truncating the model below a certain depth. In this case, since we are reading the actual current data, the drift for upper 3m will also include the drift due to wind and stokes in addition to the usual current right (?), If this is the case and if we truncate the model below 3m, then the drift due to windage and stokes drift will also get applied over the entire water column (?), even if we use this configurations.
o.set_config('drift:vertical_mixing', False)
o.set_config('drift:stokes_drift', False)
It's ok if this is the case, I just need to understand how exactly particles are getting drifted.
I'm using the below script, please see if it looks ok. Does seeding particles over different depths create any issue? if it does, I'll just seed them normally.
I would like to give it one more try in windows and if it doesn't work I'll go for Linux.
from datetime import datetime, timedelta
import numpy as np
from opendrift.models.oceandrift import OceanDrift
o = OceanDrift(loglevel=0, logfile='simulation8.log')
o.list_configspec() # to see available configuration options
# Skip vertical turbulent mixing, Stokes drift, truncate below 3m and no stranding
o.set_config('drift:vertical_mixing', False)
o.set_config('drift:stokes_drift', False)
o.set_config('general:coastline_action', 'previous')
o.set_config('drift:truncate_ocean_model_below_m', 3)
# SVIM current 1960- 30 Sept 2019
o.add_readers_from_list([
#'https://thredds.met.no/thredds/dodsC/sea/norkyst800m/1h/aggregate_be'
'https://thredds.met.no/thredds/dodsC/nansen-legacy-ocean/svim_daily_agg'
])
# Seeding particles randomly at different depths and over 2 years
z = -np.random.rand(200000)*50
seed_time = [datetime(2009, 1, 1, 0), datetime(2011, 1, 1, 0)]
end_time = datetime(2014, 1, 1, 0)
o.seed_elements(lon=2.4, lat=59.2, z=z, radius=0, number=200000,
time=seed_time)
o.run(end_time=end_time, outfile='5years.nc',
time_step=timedelta(hours=3), time_step_output=timedelta(days=7))
I edited your comment with 3 triple back-quotes before and after code samples, for better layout.
16 GB RAM should be fine, so still a mystery why your computer became that slow. But most likely it will perform better with smaller amounts of data.
This example illustrates the vertical drift profile with a uniform background current (applied throughout the whole column, but shown only down to 5m), plus Stokes drift (the "bending" towards surface), plus the wind drift (the point "flying away" near the surface).
But for your case with neutral particles, they will not spend enough time very close to the surface (upper meter) for Stokes and windage to be very important, so it is ok to neglect this (as is configured in your example). There is already the indirect effect of the wind on the model current, as this is forced by an atmospheric model.
But I realise one more way to reduce time: as you don't need to read from SVIM the variables you don't need. To achieve this, you may add the reader in this way:
from opendrift.readers import reader_ROMS_native
SVIM = reader_ROMS_native.Reader('https://thredds.met.no/thredds/dodsC/nansen-legacy-ocean/svim_daily_agg')
o.add_reader(SVIM, variables = ['x_sea_water_velocity', 'y_sea_water_velocity', 'sea_floor_depth_below_sea_level'])
Then only the 3 mentioned variables will be read from SVIM, and the remaining (which you anyway don't use) will be default constants.
Thank you for suggesting one more alternative and editing the comments.
I understand the suggested example. The polymers are usually released around 15-20 m and if we read upper 3m data and apply it uniformly across the water column than is it a lot that we are compromising on accuracy? I'm just thinking that perhaps water masses in between 20-100 m depth might be moving quite differently than the upper 3m.
Also, in the suggested alternative way, do we need to include 'upward_sea_water_velocity' as one of the variables? I used to think that 'upward_sea_water_velocity' contributes to the 3D component, is this correct?
Yes, you are right, upward_sea_water_velocity
should also be included. Thus we save only ocean_vertical_diffusivity
, but which is still a 25% reduction, as depth is a 2D variable.
Yes, water between 20-100m will flow differently from 0-3m, but gradients are normally largest close to the surface. So truncation is not ideal, but I believe results would not change dramatically from a full 3D simulation. Full 3D over 5 years seems not feasible, but you might do a comparison of truncation vs no-truncation over a shorter interval (e.g. 3 months) to investigate how large the differences are. I would assume you get the same qualitative picture. I have done basic tests like this before, and found only moderate differences, but should be careful to conclude that truncation is always ok.
Ok. I'll run the simulation by truncating the model below 3 m and by including the variables that are only needed.
I'll do the comparison for truncation / no truncation for 3 months and see the difference, thank you for the suggestion.
Also, from the output of a full 3D simulation of 1000 particles over 2 years, it can be seen that majority of the particles are not going below 200 m depth (?) So if truncating below 3 m gives faster results, then I can also try to perhaps extend the truncation to 100 m or 200 m (?) to set some balance between the accuracy and time consumed for running simulation.
To better see the vertical distribution, you may add e.g. vmin=-200
to the animation method.
Also you may try o.plot_property('z')
, or o.plot_vertical_distribution()
for a vertical histogram. The latter has an interactive slider at the bottom.
Thank you for suggesting plots for better visualization.
When I used the command o.plot_vertical_distribution()
, the plot gets terminated below 100 m depth. Do I need to include something more in command to go beyond 100m?
When I try to use o.animation(color='z', buffer=.1, vmin=-200)
the following plot gets generated (I'm not able to save gif, but I'll try to figure that out). The depth doesn't go beyond 1 m for some reason. Am i making some error in writing commands for this and above plot?
For o.plot_property(z)
following plot is generated which looks ok i think.
plot_vertical_distribution()
is hardcoded for above 100m, but might anyway not be suitable here. You might try instead o.plot_vertical_distribution_new()
, where you may provide maxdepth
.
For the animation, it seems necessary to also provide vmax=0
.
I tried both the methods and it works well now. For animation we do need to providevmax=0
.
Yesterday i started the simulation for 5 years with truncating model below 3 m. Please see the attach log
Test.txt for last few time steps. It seems like this simulation is working quite ok as it is more than half way already. However in the terminal there was error called
Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Users\mevo\.conda\envs\opendrift\lib\multiprocessing\spawn.py", line 107, in spawn_main new_handle = reduction.duplicate(pipe_handle, File "C:\Users\mevo\.conda\envs\opendrift\lib\multiprocessing\reduction.py", line 79, in duplicate return _winapi.DuplicateHandle( OSError: [WinError 6] The handle is invalid curl error details: curl error details:
The same error popped up in previous simulation as well. However it seems like the current simulation is running quite ok (?).
The log seems fine. For ROMS native readers it takes time to convert between x,y and lon,lat which is done with polynomial interpolation. Parallel processing (multiprocessing) is implemented to speed up this, but if it fails, as your error indicates, and which is perhaps a Windows issue, OpenDrift reverts to using a single processor.
Ok. Thanks so much. When I plotted results for 2 years simulation of 1000 particles, the majority of particles were in less than 100m depth. I was thinking of giving it a try to run the 5-year simulation by truncating model below 100 m. It may take a few days more than the current one but perhaps more accurate.
Hi, It seems I'm running out of luck! The simulation was almost on the verge of completing Test1.txt (13206 steps out of 14608 were completed). For some reason internet connection was lost and re-stored and now the model has stopped downloading the data. I wonder if there is something that needs to be done to start the model from the point that it got stopped? In other case can i at least use the output file to see the results until 13206 step ?
Also, around 68000 particles were out of coverage area which seems significant (please see the attached text file). Is there any other reader which can be added to increase the coverage area?
I've also started simulations in the two spare PC that I've (model truncation below 100m). Those are still running :)
Yes, that was a pity. It is not possible to continue an aborted simulation. However, output is written to file every 100 output-time-steps (not calculation time steps), and it should be possible to use the file anyway (presuming that it fits in computer memory):
>>> import opendrift
>>> o = opendrift.open(<file.nc>)
I am not sure what it means that 68000 particles are out of coverage, but perhaps this is related to the failing data transfer? Did you get similar messages in the time steps before this?
It would be very interesting to compare simulations which are truncated at resp 3m and 100m. So it would be good if those could be made with otherwise identical configuration.
Ok. I'll try to open a file and see if i can plot the results until the point where simulation is aborted.
It seems like 68000 particles going out of coverage area is not related to the failing data transfer but over a period of simulation particles kept going outside of coverage area. From the log file i can see that at different time steps more and more particles were going out of coverage area, eventually number adding upto 68000 which is more than 30% of the particles launched.
I'll keep you updated when i get the results from other simulations to see the comparision.
Hi, Both simulations were successfully completed. For simulation with model truncation below 3m, the particles do not travel/stay in the Kara sea as much as they do for simulation with truncation below 100m. Also, it looks like particles are more scattered in the model truncation below 100m. It took about an hour to plot the results :)
For model truncation below 3m around 82000 and for model truncation below 100m around 77000 particles are out of coverage area. These particles go out of coverage area over a period of time. I wonder if there is any other data set that covers larger area than the SVIM data set. Some other alternative to deal with particles going out of coverage area?
First link below is for model truncation below 3m and the second for 100m. I can't share .mp4 files here, therefore, sharing onedrive links, please see if it is possible to access the files.
Nice animations! There are a bit more differences between the 3m- and 100m-truncated runs than I expected. Thus I suggest to use the 100m-simulation for your analyses.
You might find some global datasets covering the actual period on http://tds.hycom.org, or you could download data to local disk from http://www.cmems.eu (but would need many terabytes for 3D data over years). These have coarser resolution (~10-12km vs 4km for SVIM), so you should then use SVIM for the inner domain.
However, I don't think it would change the overall picture very much. And there would be gradients and discontinuities on the boundaries between models, so it is anyway not a perfect solution.
Ok. I'll use truncation below 100 m for the analyses. Thank you for the suggesting Hycom and Cmems datasets that could be used. If overall picture doesn't change a lot, i'll stick to using SVIM datasets. In any case, when the water masses will travel more northwards it will get cold and flow in the deep water formations (well below 100m).
Now, I've got a reasonable understanding of how Opendrift model operates. However, I would like to know how exactly the interpolation calculations are done in time and space by using the forcing data from servers. I was going through this code ,however, I am still new to python and it is difficult to understand how exactly the calculations are done by reading the code. Is there some other material that i can go through to understand the interpolation step? I saw the link to the usermanual in the Opendrift paper, however, can't find manual there. Am i missing out something here? Please let me know about this.
The interpolation may not be well documented, but the steps are:
An Euler propagation scheme is used for the advection.
A Runge-Kutta scheme may also be used (o.set_config('drift:scheme', 'runge-kutta4')
), however it has been shown that difference is very small, given that linear interpolation is used in the horizontal.
Ok, Thank you for explaining the interpolation steps and information about Runge-kutta scheme. I'll go through the suggested article to understand the interpolation in further detail.
I find Opendrift model quite interesting and therefore also going to study other sub-classes of the model, starting with Openoil.
Really appreciate all your suggestions and guidance until now, thank you so much.
Hi, Referring to one of the previous answers (Please see below) about what exactly density plot shows, I need to know even if we saturate particles within the boxes (by changing the vmax)
, do the dimensions of the box remains the same, here 50*50 Km (?). Also, what is the vertical dimension of these boxes?
The color is simply the number of elements within the given boxes (here 50*50km). If you want to see more details, you may provide
vmax=30
, at the cost of saturating the boxes with more particles.Files up to a few GB should be fine to import into memory, but plotting/animation will become gradually slower (in the end unbearable slow). For this scale, I would recommend using at least 100.000 elements.
The size of the boxes is given by density_pixelsize_m
, and include all depths.
ok. The density_pixelsize_m
used was 50000, how does this translate to 50*50 Km? I'm sorry but I did not get the vertical dimension. Does it change with changing depth? Maybe it is quite basic but I am not able to figure out
50000m is 50km, and the boxes are always quadratic. All elements from surface to seafloor are counted within a given box.
Ok, got it. thanks so much.
Hi, Similar to the above 5 years simulation, now I am trying to run the simulation for 10 years. I check the log file daily and got this error Error.txt today. The model was running ok until step 15013, however, some error occurred at step 15014. After this error, there is no interpolation in log also the number of missing elements disappeared suddenly from the previous step. Also, this error now continues to appear in every step with no interpolation showing up. Is it advisable to continue the simulation with this error?
Hi @knutfrode , can you please look into the error?
The error should indicate that data is missing for the given times. The simulation can continue, but you will only get fallback values (0 wind, 0 currents).
You could try again to start a simulation at that specific time, to see if the same happens again. If so, there seem to be a hole in the SVIM dataset.
ok. I understand. Thank you for the suggestion, I'll try to start the simulation at that time and see how it goes.
Hi, Assuming there is a hole in the dataset, I changed the timescale and shifted it from 2000 to 1990 in another simulation. For this simulation, the same error started appearing from step 19190 (26/July/1996), please see the file Error file.txt. Following your suggestion, a simulation for a shorter duration was started (for the time scale when error starts apperaing) from 25/7/1996. To my surprise this simulation was completed without any error of not able to find a netcdf file. Please see test.log
Can you please see what could be the error in this case, Thanks.
Hi @knutfrode , can you please comment on what could be the issue in above case and if there is some solution for this, Thanks.
When there are holes in the dataset, the simulation will stop, unless you have added other backup readers, or provided fallback value for the given parameter. For OceanDrift, default fallback values of 0 are given for current, so that the simulation will not stop, but elements will not be moved until you have passed the hole. If you start the simulation "after" the hole, it should be no problem, until you eventually reach the next hole.
Yes I understand this. However as you had suggested earlier to start simulation at the time where the log start showing hole in the data set, just to perhaps confirm if there is really a hole in the data set. I did this cross-check by running another small simulation at the time when it starts showing hole in the longer simulation and surprisingly shorter simulation worked quite well. Does it not prove that there is perhaps no hole (unless I am missing out on something) Please see the log that I have shared above.
Yes, if you start the new simulation before/during the hole, and not afterwards, one would expect it to fail as well. However, it may be that the "hole" in this case does not mean that data is actually missing, but that the thredds server was unavailable for some seconds. This may happen from time to time, and the risk is proportional to the length of the simulation. However, if you are using a fallback value of 0, then it may for some applications be not critical if objects are not moving during a few hours of a simulation over several years.
I see...I think that is the exact issue. The thredds become unavailable for long time scale simulations. I noticed that for the simulation that i started for 10 years and 15 years. So actually there is no hole but unavailability of the thredds server for some time. It seems like there is no solution to this issue. I can confirm that when I plotted the 10 years simulation the particles were frozen (because of fall back velocity 0) for the duration server was not available.
The purpose of running this long term simulation (10-15 years) is to see how long it takes for all water in the North/Norwegian sea to leave this area (volume flux, we can perhaps make some conclusion from the particles that are going out of coverage area (?)) and for freshwater to come from South. So that we can conclude that all the water in the North/Norwegian sea will be polymer-free after so and so years. do you have some suggestions to approach this?
Hi @knutfrode , I am just sharing the result from the 10-year simulation through this link https://liveuis-my.sharepoint.com/:v:/g/personal/2915803_uis_no/EanqiCbZt4hIpCaFA-IoWjcB3T4vY2sC8podCfC1YcqqFA?e=pzzXQ4. Please see if it is possible to open it. The issue is that the model stops for about 3 years. It seems like a long time for the server to be unavailable.
I can see the movie. The simulation uses very little time per time-step when there is no current data to read, so the server might have been unavailable for just a few minutes. But this can be seen from the time of the corresponding log-lines.
Anyway, these results can of course only be used until the data are unavailable. So if you need a longer simulation, you would need to re-run it. However, a continuous uptime of the thredds-server can unfortunately not be guaranteed.
ok..I used o.run(end_time=end_time, outfile='10years.nc', time_step=timedelta(hours=3), time_step_output=timedelta(days=15))
to have smaller size of output file, but perhaps I can go little lower on the time step output. I'll re-run with shorter time step output and hope for the best. Thanks :)
Hi @knutfrode , I am trying to run a new simulation with SVIM dataset, however below error is popping out when trying to read the dataset.
SVIM = reader_ROMS_native.Reader('https://thredds.met.no/thredds/dodsC/nansen-legacy-ocean/svim_daily_agg') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "c:\users\mevo\opendrift\opendrift\readers\reader_ROMS_native.py", line 196, in __init__ self.times = [datetime.utcfromtimestamp((OT - File "c:\users\mevo\opendrift\opendrift\readers\reader_ROMS_native.py", line 196, in <listcomp> self.times = [datetime.utcfromtimestamp((OT - OSError: [Errno 22] Invalid argument
Can you please see what is the issue? Thanks
I cannot reproduce this problem, as the following works fine (though takes some time):
>>> from opendrift.reader.reader_ROMS_native import Reader
>>> r = Reader('https://thredds.met.no/thredds/dodsC/nansen-legacy-ocean/svim_daily_agg')
Can you try once more? If it does not work, you could try to update OpneDrift and dependencies with
$ git pull
$ conda env update -f environment.yml
I did try again after updating Opendrift and dependencies (in two computers) and the exact error appears in both the system. I just edited the error because the model showed few more steps before showing the same error again.
>>> from opendrift.readers import reader_ROMS_native
>>> SVIM = reader_ROMS_native.Reader('https://thredds.met.no/thredds/dodsC/nansen-legacy-ocean/svim_daily_agg')
15:53:34 INFO: Opening dataset: https://thredds.met.no/thredds/dodsC/nansen-legacy-ocean/svim_daily_agg 15:53:34 INFO: Opening file with Dataset 15:53:34 WARNING: Vtransform not found, using 1 15:53:34 INFO: Read GLS parameters from file. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\mevo\opendrift\opendrift\readers\reader_ROMS_native.py", line 196, in __init__ self.times = [datetime.utcfromtimestamp((OT - File "C:\Users\mevo\opendrift\opendrift\readers\reader_ROMS_native.py", line 196, in <listcomp> self.times = [datetime.utcfromtimestamp((OT - OSError: [Errno 22] Invalid argument
I was using the above command previously. But the command you suggested also doesn't work. Something is strange.
>>>from opendrift.reader.reader_ROMS_native import Reader
Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'opendrift.reader'
Hi @knutfrode , I tried again today but the same error persists, could this be a windows issue? This command doesn't work either and gives the error as mentioned in the above comment.
>>> from opendrift.reader.reader_ROMS_native import Reader
>>> r = Reader('https://thredds.met.no/thredds/dodsC/nansen-legacy-ocean/svim_daily_agg')
was it possible to read the SVIM data set in windows in the latest Opendrift version (1.4.0), when you tried to reproduce the error? Could you please suggest the way forward? Thanks
I just removed and re-installed the model but still getting the same error when trying to read the dataset, please help as per your convenience :)
Hi @knutfrode , The code stops running at line 196 in reader_ROMS_native.py which is related to datetime. It seems like this is the windows issue, I found this discussion on Python webpage This and this. It seems like something similar is going on in the error that I'm getting.
could you see some way to get around the error?
`
Hello Team, First of all, Congratulations and thanks so much for developing easy to use open source trajectory modelling software. I'm new to python and hope i can use this software for my PhD research here at University of Stavanger in Norway. I've installed Opendrift and trying to run tutorials and examples. However, when running tutorials, I'm getting error when trying to add reader from a local file and also when trying to import landmask. Can someone please guide me about this issue. I'm pasting errors below. Thanks in advance!
-------------Error while trying to add reader from a local file--------------
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "", line 1, in
File "C:\Users\mevo.conda\envs\opendrift\lib\site-packages\opendrift\readers\reader_netCDF_CF_generic.py", line 144, in init
raise ValueError(e)
ValueError: [Errno 2] No such file or directory: b'norkyst800_16Nov2015.nc'
-------------------- Error when importing landmask--------------------