MusicConnectionMachine / RelationshipsG4

In this repository we will try to build and determine relationships between composers
GNU Affero General Public License v3.0
0 stars 3 forks source link

Scalability #34

Closed simonzachau closed 7 years ago

simonzachau commented 7 years ago

Currently we don't know how our implementation performs in a larger environment. Therefore, we want to test it on Microsoft Azure / Docker.

This involves the following steps:

nbasargin commented 7 years ago

@simonzachau the file that you received is already in the .wet format. I just compressed it into a .zip file to reduce size for the upload. Just unpack it ;) The files on the cloud storage will probably be already unpacked but we should discuss it.

simonzachau commented 7 years ago

@nyxathid thanks, I've only read about WARC in your comment. But the unzipped file is actually a nice and ready-to-use .warc.wet :)

simonzachau commented 7 years ago

Our Language Processing is set up on http://relationshipsg4nlp.azurewebsites.net. Next we are going to deploy our own image to be able to increase the memory cap of coreNLP of 4 GB up to 7 GB that is set up in the currently used Dockerfile. This helps us to be able to scale up, additionally to scaling out.

sacdallago commented 7 years ago

@simonzachau I'm missing why such an open service is needed? Does it expose an API to perform the language processing, is that why you need it deployed this way? In that case: can't you use virtual networks inside Azure and avoid public-ip-ing the whole thing?

simonzachau commented 7 years ago

@sacdallago This was just our first successful try of getting anything on Azure to work. If you think a virtual network is a good solution, we are going to look into it!

Sandr0x00 commented 7 years ago

@sacdallago (or maybe @kordianbruck) We are trying to put our WebApp into a VN, but we don't have options to integrate a WebApp into a VN. Do we have to create a Virtual Machine itself (which doesn't gives us the options to use Docker Containers, in which our NLP is). Maybe you can help us here?

Could be interesting for @MusicConnectionMachine/group-3 aswell.

kordianbruck commented 7 years ago

Silly thing, but did you check the Docs? :grinning:

simonzachau commented 7 years ago

@kordianbruck we tried this with several installs (e.g. the app service corenlpSandro2) today. The problem is that in contradiction to the docs the routing tab is not showing up, and consequently the networking sub tab isn't showing up either even though the installs are scaled as a standard plan.

kordianbruck commented 7 years ago

Even this website says there should be a Networking tab. @sacdallago ideas? can we contact that guy from azure? or directly as support?

sacdallago commented 7 years ago

@simonzachau @Sandr00 ( @kordianbruck ) I guess the best solution is to just open a ticket with support, they are usually really quick in answering! The guy that presented to us from MS might not have the answers to the problems you need.

simonzachau commented 7 years ago

I opened a ticket on Azure. They claim to respond within 4 hours.

simonzachau commented 7 years ago

@sacdallago and @kordianbruck I've forwarded the reply from the support to you. I am going to try to set up a dynamic routing gateway and keep in touch with the support.

sacdallago commented 7 years ago

Thanks @simonzachau . When you succeed, please also let us know! I might invite you for a 5 min chat where you tell me how you did it 💃

simonzachau commented 7 years ago

@sacdallago yes! I am going to work on it in the afternoon today!

simonzachau commented 7 years ago

status update:

After a dozen emails with the support and many tries to make the networking tab appear, it still doesn't want to show up. The basic idea of the support was that we have to "[...] create a dummy root cert [...] – then upload – and that should unblock the App Service VNET integration". So I managed to do this (within a virtual network gateway), but that didn't help anything.

Furthermore, the support brought up if a point-to-site virtual network integration ("This is how it works in Azure.") is what we need. In contrast to this configuration, which is based on a unidirectional communication, an ASE (app service environment) could offer bidirectional communication. I don't know how to proceed now (advantages / disadvantages).

kordianbruck commented 7 years ago

@simonzachau they really don't seem too helpful in this matter - what a bummer. I mean in the very exterme case, we can always just run it in a VM if all else fails. I'd rather not do that.

Did you try an ASE?

simonzachau commented 7 years ago

@kordianbruck I've read about ASE but don't know if we need it. We could play with it and try what works (takes some hours to deploy just that component) but I'm still not an expert at setting up Azure infrastructures

vviro commented 7 years ago

Guys, why not use the simplest tools (and avoid most of that vendor lock-in as a bonus) and just spin up the VMs using a simple shell script (see https://docs.microsoft.com/en-us/azure/virtual-machines/linux/classic/createportal) and run the worker processes in these VMs? The UnstrucrturedData group has probably already set up everything you need for this, ask e.g. @nyxathid. You don't need to reinvent this wheel if they've already set this up.

simonzachau commented 7 years ago

@vviro thanks for looking into this matter! As far as I see group2 has also set up a virtual network. However, I don't see their virtual machine linking to any docker container, which ours does. Furthermore, the cloud service got the same "problem" of having a public IP (but that's better than the actual VMs being public I suppose?), which was the starting point of changing our setup further up in this thread.

@nyxathid maybe you can introduce us into your setup and your thoughts about integrating our logic into it?

sacdallago commented 7 years ago

@simonzachau thank you so much for trying so hard with support, this is a very big limitation indeed and tomorrow, by chance, I'm going to meet to some higher ups of MS to which I will explain the problem.

I understand that the problem is anyway only related to shared networks between VM and App Services, right?

I think that we can settle for a DIY compromise, as @kordianbruck and @vviro have suggested. Differently from their suggestions though, would I would go with is.

Our original idea:

Shared VNET
|
|- [VM] with postgres (inet: 1.2.3.4)
|
|- [VM(s)] to run scripts, populate Postgres DB via psql connection string to internal net (psql://1.2.3.4)
|
|- (APP SERVICE) API (Docker, connects to postgres VM and exports GET requests)--> Exposes to xyz.com

Independent
- (APP SERVICE) Frontend (uses entry point address of API (xyz.com))

What we can do now without changing anything in the workflow, just the deployment will be different

Our original idea:

Shared VNET
|
|- [VM] with postgres (inet: 1.2.3.4)
|
|- [VM(s)] to run scripts, populate Postgres DB via psql connection string to internal net (psql://1.2.3.4)
|
|- [VM] API (Docker + CRON, connects to postgres VM and exports GET requests)--> Exposes to xyz.com

Independent
- (APP SERVICE) Frontend (uses entry point address of API (xyz.com))
simonzachau commented 7 years ago

@sacdallago At least for our part we don't have "shared networks between VM and App Services" yet, that would have been the case at a later point only (when integrating with group 2). Our problem was that the "networking" menu is not showing up opposed to all tutorials on the web.

Your original idea vs. your idea for the new setup differs regarding the API. We're experimenting with the relationships repository and how it connects to its child services (the different NLP algorithms), which should not be publicly accessible.

sacdallago commented 7 years ago

@simonzachau yes, sorry. I went about the wrong issue here!

Anyway the solution might as well be the same: one dedicated VM for each tool in a resource group called something like "NLPtoolsXYZ", VNET between a central VM where the scripts will be running and the VMs where the NLP tools are running and we boot up the VMs with the tools only when needed?

These tools anyway always expose an API, correct? And you programatically (Restfully) access the resource?

We anyway have to specify via CLI where the NLP tool is, best would be to pass something like IP and entry point, as I assume that's how this works? Then -l http://1.2.3.4/apiToTool.

sacdallago commented 7 years ago

P.S.: @simonzachau could the problem be bound to the app services running on linux machines btw? Cause that was my first thought why it wouldn't work when I first had the problem about 2 weeks ago

simonzachau commented 7 years ago

@sacdallago yes! Until now we went with an App Service and not VM because for us it worked that it has docker support and scaling up + out implemented. I imagine that the reason that the menu doesn't appear lies in the nature of the specific app service.

simonzachau commented 7 years ago

To confirm, we just received an email from another support team on Azure. They said that "some features, such as virtual network integration, Azure Active Directory/third-party authentication, or Kudu site extensions, are not complete [...] in web app on Linux. [...] If you would like to use Azure web apps (hosted on windows) with a VNET then you can deploy web apps inside a VNET using ASE (App service environment) which is a premium service offering."

sacdallago commented 7 years ago

As I expected :D

vviro commented 7 years ago

If I remember correctly, the original plan for the continued operation of the system is for it to be hosted outside Azure (since we don't have a long-term support commitment from them). This implies that the frontend and backend should be operational outside of Azure. Docker helps with that. Kubernetes would be awesome, but maybe there is no time for that. Is the current plan compatible with this?

simonzachau commented 7 years ago

@vviro Yes, we use dockers all the way ;-)

vviro commented 7 years ago

@simonzachau Oh yes, I was referring only to the Azure part (ASE, VNET etc.)

simonzachau commented 7 years ago

@vviro regarding Azure we will follow @MusicConnectionMachine/group-2 or use ASE.

vviro commented 7 years ago

@sacdallago @simonzachau so will it be possible to run the backend outside of Azure? I don't mean the bulk data processing of the CommonCrawl, only the database deployment.

simonzachau commented 7 years ago

@vviro In my view our code is and is going to stay pretty independent from Azure.

sacdallago commented 7 years ago

@vviro well, the backend is database + API, all you need is a postgresdb somewhere and a machine capable of running a docker container that will also be able to connect to the machine where postgres is deployed.

The scripts to populate the db can agin be spawned anywhere, as long as the machine has access to the machine with the db.

The frontend will only need to know where the API is to communicate with it.

So: yes. The whole idea is that all this is modular and runnable anywhere

simonzachau commented 7 years ago

Since we've merged groups, I'm closing this issue now. Follow-up here