Very slow performance with more than 30000 particles

Boorhin commented 3 years ago

I have tested the Telemac Selafin Reader and I noticed a consequent drop in performances when there are more than 30 000 particles. In comparison with the other particle tracker I know it seems very little (have done up to 10⁶ particles without much trouble). The quantity of RAM used, even with an output file, is very large (40Gb for 30000 particles). I am wondering what is the interest to always convert the projection system when you already have a metric projection with metric velocities? I would think that reprojection is one of the main culprit. Is that for Coriolis? I would think that you could avoid millions of operation of reprojection by using an approximation. The fact that each time-step of the ocean model calls the reader is also an issue as it is over-sampled most of the time. I understood that you were working on this. Finally the reader is parallelised but it is not clear if the model is. I have used OceanDrift as a model with coastline action previous

josefilho77 commented 3 years ago

Hi Boorhin. Thats great you're finally using telemac selafin reader... could you kindly share the script you used for converting .slf unstructured to netcdf (opendrift compatible)? I managed to use telemac 3D file, but only converting for a regular grid which is giving me some issues, strange particle behavior.

Best

gauteh commented 3 years ago

I think the best way to do that is to use a profiler like maybe py-spy on an example to try and figure out where the time is spent. Also it is very useful to make small tests and write some benchmarks (e.g. looking up 10k particles). You can have a look at how these benchmarks are done for FVCOM: https://github.com/OpenDrift/opendrift/blob/master/tests/readers/test_fvcom.py#L52 and then compare the performance to how well FVCOM does it. Benchmarks are run with pytest --benchmark-enable, you select specific tests with -k or specifying the test filename. Once those are established it makes sense to look at what other parts should be prioritized for optimization.

Boorhin commented 3 years ago

@josefilho77 I have used a reader I wrote that accesses directly slf files based on a Telemac installation. I think it can now be found in the master branch, https://github.com/OpenDrift/opendrift/blob/master/opendrift/readers/reader_telemac_selafin.py @gauteh Thanks a lot, I will be looking into that. Have you tried other readers with a lot of particles?

knutfrode commented 3 years ago

@Boorhin Using a million particles is not a problem with regular readers (though of course a bit slower than with 1000 particles). I would believe that interpolation from regular grids will always be simpler/faster than from unstructured grids, but it is probably still possible to improve the performance of the unstructured readers. It is probably the unstructured interpolation which is the bottleneck, as reprojection is normally very fast.

As @gaute says, profiling would be the way to detect where the bottleneck is in your reader/usecase. A more primitive way than using pytest would be to simply insert some print-statements in the reader, and see where it hangs when using many particles.

josefilho77 commented 3 years ago

@Boorhin Thank you very much. I am already using it with more than 30.000 it is really slow indeed. What have you done by the time stamp? Here it becomes the original data of selafin. (1972,00,00). Have you managed to convert it for real DateTime stamp? I believe it is happening because the telemac 3D file does not have a proper timestamp. If you take it from the 2D file, It should be ok.

Also, how would suppose to be the way to include Salinity and Temperature in the reader so that opendrift can assimilate it as well?

Thanks again.

Best

Boorhin commented 3 years ago

@josefilho77 in the example, I have used the one which is in the file. I think it is a bug in Telemac during the writing of the file, the timestamp is not written by default. You would have to overwrite it in your code. I have to ask the Telemac team what they think about it. But I have inspected the binaries of my file and the date is not in it, that's certain (although specified). You can have a look in my branch https://github.com/Boorhin/opendrift/blob/jul_FE_netcdf/examples/example_selafin.py You will find some ideas on how to manipulate the reader, especially if you know the one from the Telemac scripts but basically you can change the start time with selafin.start_time=(a datetime object)

gauteh commented 3 years ago

@Boorhin can this test file be included in the opendrift repository?

Boorhin commented 3 years ago

Sure thing, I have made improvement in my branch. You can check the new reader, example Selafin and the files as indicated in slack

On Wed, 26 May 2021, 08:55 Gaute Hope, @.***> wrote:

@Boorhin https://github.com/Boorhin can this test file be included in the opendrift repository?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenDrift/opendrift/issues/625#issuecomment-848553196, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJCEFR2SELO55HLPJ4YGBLTPSSQ3ANCNFSM45LYLCMQ .

gauteh commented 3 years ago

Yep, looking at your latest branch now. Will try to set up a simple test file if I can get telemac-python bindings to work. Then it should be easier for you to extend that file.

Boorhin commented 3 years ago

OK, I have investigated the reader a bit more and I had to make some modifications related to the datetime management. I have tested an optional parameter called start_time to overwrite the one from the file which is erroneous. I have tested the big operations of the reader and they are all fast except the extraction of variables from the slf file

17:22:58 INFO    opendrift.models.basemodel: 2019-02-01 09:00:00 - step 55 of 12800 - 34986 active elements (0 deactivated)
17:22:58 INFO    opendrift.readers.reader_telemac_selafin: searching time frame
17:22:58 INFO    opendrift.readers.reader_telemac_selafin: found time frame
17:22:58 INFO    opendrift.readers.reader_telemac_selafin: kde tree query
17:22:58 INFO    opendrift.readers.reader_telemac_selafin: query done
17:22:58 INFO    opendrift.readers.reader_telemac_selafin: extract fibre depths
17:22:58 INFO    opendrift.readers.reader_telemac_selafin: fibre depths extracted
17:22:58 INFO    opendrift.readers.reader_telemac_selafin: extract variables
17:23:14 INFO    opendrift.readers.reader_telemac_selafin: variables sent
17:23:15 INFO    opendrift.models.basemodel: 2019-02-01 09:10:00 - step 56 of 12800 - 34986 active elements (0 deactivated)
17:23:15 INFO    opendrift.readers.reader_telemac_selafin: searching time frame
17:23:15 INFO    opendrift.readers.reader_telemac_selafin: found time frame
17:23:15 INFO    opendrift.readers.reader_telemac_selafin: kde tree query
17:23:15 INFO    opendrift.readers.reader_telemac_selafin: query done
17:23:15 INFO    opendrift.readers.reader_telemac_selafin: extract fibre depths
17:23:16 INFO    opendrift.readers.reader_telemac_selafin: fibre depths extracted
17:23:16 INFO    opendrift.readers.reader_telemac_selafin: extract variables
17:23:31 INFO    opendrift.readers.reader_telemac_selafin: variables sent
17:23:32 INFO    opendrift.models.basemodel: 2019-02-01 09:20:00 - step 57 of 12800 - 34986 active elements (0 deactivated)
17:23:32 INFO    opendrift.readers.reader_telemac_selafin: searching time frame
17:23:32 INFO    opendrift.readers.reader_telemac_selafin: found time frame
17:23:32 INFO    opendrift.readers.reader_telemac_selafin: kde tree query
17:23:32 INFO    opendrift.readers.reader_telemac_selafin: query done
17:23:32 INFO    opendrift.readers.reader_telemac_selafin: extract fibre depths
17:23:33 INFO    opendrift.readers.reader_telemac_selafin: fibre depths extracted
17:23:33 INFO    opendrift.readers.reader_telemac_selafin: extract variables
17:23:49 INFO    opendrift.readers.reader_telemac_selafin: variables sent
17:23:50 INFO    opendrift.models.basemodel: 2019-02-01 09:30:00 - step 58 of 12800 - 34986 active elements (0 deactivated)
17:23:50 INFO    opendrift.readers.reader_telemac_selafin: searching time frame
17:23:50 INFO    opendrift.readers.reader_telemac_selafin: found time frame
17:23:50 INFO    opendrift.readers.reader_telemac_selafin: kde tree query
17:23:50 INFO    opendrift.readers.reader_telemac_selafin: query done
17:23:50 INFO    opendrift.readers.reader_telemac_selafin: extract fibre depths
17:23:50 INFO    opendrift.readers.reader_telemac_selafin: fibre depths extracted
17:23:50 INFO    opendrift.readers.reader_telemac_selafin: extract variables
17:24:06 INFO    opendrift.readers.reader_telemac_selafin: variables sent
17:24:07 INFO    opendrift.models.basemodel: 2019-02-01 09:40:00 - step 59 of 12800 - 34986 active elements (0 deactivated)
17:24:07 INFO    opendrift.readers.reader_telemac_selafin: searching time frame
17:24:07 INFO    opendrift.readers.reader_telemac_selafin: found time frame
17:24:07 INFO    opendrift.readers.reader_telemac_selafin: kde tree query
17:24:07 INFO    opendrift.readers.reader_telemac_selafin: query done
17:24:07 INFO    opendrift.readers.reader_telemac_selafin: extract fibre depths
17:24:07 INFO    opendrift.readers.reader_telemac_selafin: fibre depths extracted
17:24:07 INFO    opendrift.readers.reader_telemac_selafin: extract variables
17:24:22 INFO    opendrift.readers.reader_telemac_selafin: variables sent

So I will be investigating the extract variable bit. The issue is in the RAM I think as this step uses 30 GB at each iteration and then release the memory.

Boorhin commented 3 years ago

I have considerably optimised the reader and in terms of processing it is now much faster. However the memory leak is inside the Telemac Script I cannot process more than 90 000 particles because the reader takes all my RAM. The ramp you see is the function from the script Selafin.get_variables_at().

Here is the function:

    def get_variables_at(self, frame, vars_indexes):
        """
        Get values for a given time step and a list of variables

        @param frame (int) Time step to extract
        @param vars_indexes (list) List of variable indices

        @return (np.array) array containg the values for each variable
        """
        f = self.file['hook']
        endian = self.file['endian']
        ftype, fsize = self.file['float']
        if fsize == 4:
            z = np.zeros((len(vars_indexes), self.npoin3), dtype=np.float32)
        else:
            z = np.zeros((len(vars_indexes), self.npoin3), dtype=np.float64)
        # if tags has 31 frames, len(tags)=31 from 0 to 30,
        # then frame should be >= 0 and < len(tags)
        if frame < len(self.tags['cores']) and frame >= 0:
            f.seek(self.tags['cores'][frame])
            f.seek(4+fsize+4, 1)
            for ivar in range(self.nvar):
                f.seek(4, 1)
                if ivar in vars_indexes:
                    z[vars_indexes.index(ivar)] = \
                              unpack(endian+str(self.npoin3)+ftype,
                                     f.read(fsize*self.npoin3))
                else:
                    f.seek(fsize*self.npoin3, 1)
                f.seek(4, 1)
        return z

I think the file should be memory mapped (it is only 11GB) I think it doesn't behave as it should especially I suspect that the conversion to float64 is totally useless What do you think?

knutfrode commented 3 years ago

I have not looked at this in detail, but yes - float64 seems unnecessary. In fact, OpenDrift uses float32 for most internal variables, which improves performance and memory usage a lot, and also gives smaller output files. Sensitivity tests showed totally negligible differences from float64, which would be the default.

gauteh commented 3 years ago

I don't think it is converting, it depends on the type that is stored in the file. So the size would be different.

gauteh commented 3 years ago

It is difficult to benchmark memory, are you sure it is not just virtual, or unreleased? It doesn't seem like there are any stray references created in that function? Maybe it would be useful to do some memory profiling.

Otherwise I agree that memory-mapping the file might be useful, especially since we will often access the same data multiple times. But it won't be transparent since it is passed through unpack.

Boorhin commented 3 years ago

I have double checked a few things and now all the arrays come out as float32. I have also gotten rid of unpack and use np.frombuffer() I have mmap the file Still the memory used is totally unreasonable seeking data consumes 10GB per 10000 particles. I have a hard time understanding what is going on. @gauteh what do you mean by transparent?

Boorhin commented 3 years ago

I found the memory leak, it is not in the selafin,py reader, it was inside my function.

idx_nodes, the nodes where the data were read, was not calculated normally (I don't really know why) and it created a massive memory leak.

I can now read half a million particles in a few seconds. Happy days, I am pushing the new version after a bit of cleaning

knutfrode commented 3 years ago

Excellent!

gauteh commented 3 years ago

Great!

fre. 28. mai 2021, 16:50 skrev Knut-Frode Dagestad @.***

:

Excellent!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenDrift/opendrift/issues/625#issuecomment-850473526, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAN36723ZHIMNKRZI4TDHLTP6URPANCNFSM45LYLCMQ .

OpenDrift / opendrift

Very slow performance with more than 30000 particles #625