DFO-Ocean-Navigator / Ocean-Data-Map-Project

The Ocean Navigator is an online tool that is used to help visualise scientific research data. a users guide is available at https://dfo-ocean-navigator.github.io/Ocean-Navigator-Manual/ and the tool is live at
http://navigator.oceansdata.ca
GNU General Public License v3.0
49 stars 20 forks source link

Tile race condition #139

Closed Jeffreydaw closed 5 years ago

Jeffreydaw commented 6 years ago

The tiling problem from issues #66 is back. It is not appairing in GIOPS forecast, instead it is in GIOPS daily (like bug #66 was thought to have been when first logged).

Bug #66 was a nginx caching problem, however, I have been able to verify that the problem in GIOPS daily is not a nginx caching problem. I checked the python cache and the image there matched the tiles being displayed in the navigator, just some of them were wrong. The files were all also created at the same time, sometimes on the same minute. I then deleted the tile in the Nginx cache and it came back the same as it was (bad). I followed that up by then deleated one of the tiles in the python cache and the nginx cache at the same time, when doing this the tile filled in correctly and was regenerated in the python cache.

this leads me to believe that this is a Race condition either with the python doing multithreading or with the uWSGI Emperor. I'm not sure which. and more testing is needed to be sure. selection_078

this is present in version v2.1.3

Jeffreydaw commented 6 years ago

This problem won't go away.

today I rolled out v2.3.2 and after doing so I dumped the nginx-cache now I'm seeing the tiling problem below. For GIOPS Daily for the 4th of April 2018.

screenshot from 2018-04-04 16-48-59

Before dumping the cache I was also seeing the problem.

screenshot from 2018-04-04 16-59-11

maybe a python cache problem?

Jeffreydaw commented 6 years ago

As can be seen here even when the day moved to the 6th of April the problem still is present (though something with the tiles has changed). It is important to note that this problem does not happen when run locally (tested in dev mode)

image

Note: The problem is present on the 4th and the 6th but does not seem to be present on the 5th.

Jeffreydaw commented 6 years ago

The 6th of April 2018 corresponds to the time stamp 758 but the cache has.

... 754/ 755/ 756/ 757/ 758/

Jeffreydaw commented 6 years ago

I was able to reproduce the problem on my computer by copying the python cache from the server to my computer.

This was the result for the 4th of April 2018 day index 758 image clearly, it is the same as the 4th in the post above. when this was done the files in the cash were not overwritten so it seems clear that these are the correct files. Also, I stitched the images that were in folders in the cache directory together (drag and dropped into libre office ) and this was the result.

image

The last thing I tried was viewing the same tile as displayed by the navigator alongside the one in the file. they were the same.

I believe it is safe to say that, unlike issue #66 this is not a nginx caching problem. I think it can even be said that the source of the problem is not the python cache (though it stores the problem after it comes up). This issue is looking more and more like a race condition as has been suspected for some time. The problem is now to find out whether this caused by python doing multithreading or with the uWSGI Emperor.

Jeffreydaw commented 6 years ago

It was also thought that there could be some issue with mkstemp. But it looks like the files that get saved get deleted or moved right away so there should only be one mkstemp file in the folder at a time (this has only been tested in dev mode on my computer).

Jeffreydaw commented 6 years ago

This problem has now also been seen in the artic projections.

image

Jeffreydaw commented 6 years ago

this has been moved to the icebox because remaining debugging methods have been reduced to a few options that are going to take some work to set up and they may not provide results.

Jeffreydaw commented 6 years ago

deleting the cash resolved the problem as usually though it is still present on April 27th

Jeffreydaw commented 6 years ago

I have tried running the uWSGI server on my computer using GUnicorn I loaded about 60 days in the navigator trying to force a race condition. I tried: loading multiple pages at once, loading a page and interrupting it, loading the same day in may tabs at the same time, trying across different browsers at the same time (opera, and crome) trying the arrows and the calendar to select the date, trying to use the API to load one tile and the loading the navigator

Jeffreydaw commented 6 years ago

I finally got the tileing problem in local host. This happened the day after the local test with the uWSGI server running localy. The dates that have the problem are may 2nd and 3rd

image

image

Jeffreydaw commented 6 years ago

screenshot from 2018-07-04 14-08-54

the tiling problem returned. it was noticed that the there were request in the log file for future time indexes, this is a problem as it creates the bad files that are cached.

Jeffreydaw commented 6 years ago

I thought this problem was gone due to change @NoahGallant-MUN made.... maybe not :(

screenshot from 2018-10-17 14-39-27

Jeffreydaw commented 6 years ago

I think the return of this problem was linked to the problem with the misplaced file for issue #353 I deleted the cash and it is working again.

I am moving this issues to "Done" as I think Noahs fix works.

Jeffreydaw commented 5 years ago

closing because I think this is mostly solved, it may have to be reopened at a latter time