Statistics - Githubissues

Sandr0x00 commented 7 years ago

Gather statistics about the stability of our algorithms in order to find bottlenecks in the whole pipeline.

[x] How many requests were processed
[x] How many errors occurred
[x] Which errors occurred: @ansjin please commit the error outputs I've seen in your last used version
[x] How much time took everything
[ ] Average time for each algorithm

additional:

[ ] improve design

ansjin commented 7 years ago

Building on top of what @Sandr00 did.

Relationship Statistics :

Currently running here just for testing http://104.198.227.113/

First two IPs are of relationship Algorithms (Scaled 2 Times) Last one is of DateEvent Extraction Algorithms (scaled 2 times )

Currently the scaling is not high. More the scaling then more parallel requests can be served!

Running on 765 wiki pages scrapped by team 2 here( https://github.com/MusicConnectionMachine/UnstructuredData/issues/65#issuecomment-289858901 )

Also resulted data can be checked @ 35.187.17.177 with username and database as default values of postgres

A snapshot after some elapsed time:

test

kordianbruck commented 7 years ago

@ansjin this looks goodish? What is the status column? Is that the number of finished requests of a total of 1707 after 1022 seconds?

simonzachau commented 7 years ago

@kordianbruck yes, the status says how many are done. The total are all (incl. the remaining ones). The "design" was just to get started, it's not very intuitive for now.

kordianbruck commented 7 years ago

Great. Breaking it down: So those Take roughly 30seconds to process one request. That's a lot.

simonzachau commented 7 years ago

@kordianbruck it's on the free tier of Google so it's not that powerful - If we give it 100 machines instead of 2, we'll get it done faster

simonzachau commented 7 years ago

And one more thing: The size of the request has to be taken into account before making a conclusion about the speed. Currently, one request is about half a website as far as I know.

ansjin commented 7 years ago

@kordianbruck Actually the timer doesn't gets stop if one of the algorithm has completed its all requests. It keeps on running until all the requests are finished. The last one is the DateEventExtraction which is very fast as compared to other NLP algorithms, it roughly process those many requests in less than 200sec with not much scaling.

And yes the other relationship algorithms takes around 30sec to 1 min to process a request but if we would have many multiple machines running than those requests can be processed in parallel. Also we could use kubernetes feature which allow to create multiple pods on the same machine(which acts as a new machine only) and can fully utilize the compute power of a VM. Currently this is just the test we are running to see how our complete application works and how much we will have to scale up when we will be running on azure.

ansjin commented 7 years ago

Status :

test2

Inferences

OpenIE, the second algorithm was running on VM in different Zone as compared to other algorithms. I checked my account, it showed that there were a few seconds downtime for some machines in that zone. I think that is the reason why we got so many ECONNREFUSED errors for that algorithm.

For the later deployment time, we should use VMs from different zones in our cluster so that if there is an issue with a zone then still the service gets available from other Zone VM

Date Event Extraction completes too fast!
The errors are either Socket hangup or the ECONNREFUSED, which I think would not be there once we have better VMs and more compute power.

kordianbruck commented 7 years ago

Thanks for the updates / explanation!

What does "completes too fast" mean? Is it not working? Why is too fast a problem?

1-2% error rate is fine. Anything above we should investigate.

simonzachau commented 7 years ago

@kordianbruck about investigation:

In the case of the date event extraction we once had a lot of errors and just tweaked some parameters on the Google side (number of pods per machine) and on our side (number of parallel requests). On the one hand, since they were the same errors, we hope to be able to reduce those for the relationship algorithms to virtually zero as well with the same approach. On the other hand, there are differences: Compared to the date event extraction the relationship algorithms

require a lot more processing power -> solved by scaling up and out
vary in processing power: "difficult" input takes more time -> I think @MusicConnectionMachine/group-2 might have improved their output since the time when the wikipages that we are using were generated; regardless of that we are for sure optimising the requests

ansjin commented 7 years ago

@kordianbruck Complete so fast, is a good thing for us. Actually it doesn't need as much processing as the other algorithms so that's why it is getting completed fast.

@simonzachau Those errors were mostly socket time out and they were coming because we were sending 5 parallel requests at a time when there were only 2 machines in the back-end to serve the purpose. So out of those 5, one or 2 requests were getting timed out and our error count was increasing.

MusicConnectionMachine / Relationships

Statistics #59