flagser_memory support - Githubissues

davidpitl commented 3 years ago

Would it be possible to support flagser_memory to be able to use it from Giotto-TDA?

ulupo commented 3 years ago

Hi @davidpitl! This looks like a duplicate of https://github.com/giotto-ai/pyflagser/issues/57, unless I misunderstood. Please let me know if you agree/disagree and feel free to follow/add to the discussion there. I wonder what kind of support you are hoping to see in giotto-tda; at the moment, giotto-tda does not expose data structures for filtered simplicial complexes to the user. (You can reply in the other issue if this one is a duplicate.)

davidpitl commented 3 years ago

I simply mean having the possibility for Giotto to use the Flagser in memory calculation implementation: https://github.com/luetge/flagser/blob/master/docs/documentation_flagser.pdf it talks about a faster calculation using flagser-memory

ulupo commented 3 years ago

Thanks for the clarification, and sorry for the confusion. You are right that we missed/forgot about this feature from flagser. We will look into it! @MonkeyBreaker this might be of interest to you too.

ulupo commented 3 years ago

Actually, I have now looked into the flagser code and it does seem that "in memory" calculation means "keeping the flag complex in memory", so indeed this is another side of the same issue as in #57, though not a strict duplicate as you are only asking to have a performance feature without access ti the data structure necessarily

MonkeyBreaker commented 3 years ago

Hi,

Sorry for only replying only now.

Thank you @davidpitl for suggesting adding this feature. One easy way to add it to pyflagser would be to generate a binary that compile with KEEP_FLAG_COMPLEX_IN_MEMORY definition enable. And from the python, adding a sort of boolean as input parameter to choose to run in memory or "normal".

This doesn't seem in my opinion to do a lot of changes, but I didn't work with the in memory version, some points needs to be verified before enabling this on pyflagser:

Does it output expected barcodes ?
Does it work on all OS (Windows, Mac, Linux) ?
Measure how much faster it is to run the memory version, how much more memory it consumes, etc.

I expect the first point to be a quick verification, but we need to be sure it does work. The second point about if it works on all architectures, this is important for us, because we want that all users can use our package, independent of the OS used.

For the third point, if the two previous are good, I can do it myself, it is just to have a raw estimation of performances.

Unfortunately, at the moment I have little time to work with pyflagser, Issue #57 is still pending. But in my opinion, this issue could be done before #57. Because if I understood correctly, we only want the in memory version for performance, right ?

Best, Julián

davidpitl commented 3 years ago

Does it output expected barcodes ? Looking at the Flagser tests it seems that the output is similar.

Does it work on all OS (Windows, Mac, Linux) ? See https://github.com/luetge/flagser/blob/master/src/flagser.cpp

Measure how much faster it is to run the memory version, how much more memory it consumes, etc. Running test on our server: time ./flagser-memory real 0m0,102s user 0m0,113s sys 0m0,080s time ./flagser real 2m16,709s user 2m42,685s sys 0m1,937s

MonkeyBreaker commented 3 years ago

Indeed, the runtime are an order of magnitude different ! Thank you for the information.

Could you run the same dataset but with a different command (if you're on Linux) in order to get an idea of the memory consumption ? Command: /usr/bin/time -v ... this should give more details on the run and also output the peak memory consumption.

e.g.

Command being timed: "..."
        User time (seconds): 1.16
        System time (seconds): 0.24
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.41
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 611576 <= this value interest me
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 283293
        Voluntary context switches: 4
        Involuntary context switches: 16
        Swaps: 0
        File system inputs: 8216
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

If the output are similar, that's already a good point.

Again, @davidpitl thank you for all the work you can help us :)

Best, Julián

davidpitl commented 3 years ago

The measurements have been made on a server that is currently very busy with other tasks that we cannot stop at this moment. I think you should repeat them. The results are:

_/usr/bin/time -v ./test_flagsermemory Command being timed: "./test_flagser_memory" User time (seconds): 0.12 System time (seconds): 0.07 Percent of CPU this job got: 187% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.10 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 4820 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 896 Voluntary context switches: 593 Involuntary context switches: 1 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0

_/usr/bin/time -v ./testflagser Command being timed: "./test_flagser" User time (seconds): 154.52 System time (seconds): 1.10 Percent of CPU this job got: 122% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:07.48 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 557484 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 394732 Voluntary context switches: 27893 Involuntary context switches: 1751 Swaps: 0 File system inputs: 0 File system outputs: 8 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status:

MonkeyBreaker commented 3 years ago

Thank you for the measurements. I'm just a bit confuse, I would have expect the contrary in terms of memory consumption => the in memory version should consume more memory compared to the "normal" version.

After looking at test_flagser.cpp and test_flagser_memory.cpp, the later does not test all the dataset, that's why the runtime are different.

Let me run both test but on the same datasets. I'll post with updated results.

MonkeyBreaker commented 3 years ago

I made some measurements with the test:

About small and big in the following tables, it's related to the function called inside of test_flagser.cpp and test_flagser_memory.cpp files.

small => run_all(false);
big => run_all(true);

runtime [s]

normal (small)	normal (big)	in memory (small)	in memory (big)
0.05	75	~0.05~	~72~

Memory [GB]

normal (small)	normal (big)	in memory (small)	in memory (big)
0.007	0.57	~0.007~	~0.53~

~The difference in runtime is around 4% faster when in memory is enable. And memory consumption is similar.~

I'll create an issue to discuss this results directly in flagser repository. If the author confirms that bigger speed-ups are achievable with the in memory version. Then we can consider integrating it inside of pyflagser.

@ulupo what do you think about the results measured ?

Best, Julián

davidpitl commented 3 years ago

Seems like a very small size test , doesn't it? Only 0.6 Gb of memory usage.

MonkeyBreaker commented 3 years ago

I think you're right, it's nothing big. I don't have other dataset right now to test, but let me come back when I have one that requires much more memory.

MonkeyBreaker commented 3 years ago

Well, I found out that the test for in memory was not calling the expected data structure, it was using the same one as "normal". Meaning the measure where always using the same data structure.

The results after fixing this:

runtime [s]

normal (small)	normal (big)	in memory (small)	in memory (big)
0.05	75	0.05	65

Memory [GB]

normal (small)	normal (big)	in memory (small)	in memory (big)
0.007	0.57	0.007	1.4

Now we obtain around 13% speed up at run time when using the in memory version, but at the cost of consuming at peak three times more memory.

In my opinion it's interesting to integrate it in pyflagser. The problem is how to do this, because in memory is a compile definition, we should generate a binary for when the flag is enable and disable. And, currently we already do this in order to support coefficients > 2, meaning that we should then support 4 different binaries. I don't think it's the correct approach, let me think a bit on how to do this in a better way.

But, on the meantime, if you needs to test the in memory version with pyflagser, you can build from source.

git clone https://github.com/giotto-ai/pyflagser.git
In the CMakeLists.txt file, change the following lines as follow:
- from to target_compile_definitions(flagser_pybind PRIVATE RETRIEVE_PERSISTENCE=1 KEEP_FLAG_COMPLEX_IN_MEMORY=1)
- and from to target_compile_definitions(flagser_coeff_pybind PRIVATE RETRIEVE_PERSISTENCE=1 USE_COEFFICIENTS=1 KEEP_FLAG_COMPLEX_IN_MEMORY=1)
python -m pip install -e .

Let me know if you encounter any issues.

Best, julián

giotto-ai / pyflagser

flagser_memory support #61

runtime [s]

Memory [GB]

runtime [s]

Memory [GB]