Support websockets in realtime APIs

deliahu commented 4 years ago

Motivation

Reduce latency when multiple requests are required
Stream output from the predictor as it's generated

da-source commented 4 years ago

Motivation

Reduce latency when multiple requests are required

Stream output from the predictor as it's generated

When will this feature become available?

deliahu commented 4 years ago

@da-source we haven't scheduled this one yet; we usually plan about two weeks at a time.

Would it be possible to change your API implementation so that you can make a single HTTP request to the API (or multiple distinct requests if necessary), rather than relying on streaming the results?

da-source commented 4 years ago

@da-source we haven't scheduled this one yet; we usually plan about two weeks at a time.

Would it be possible to change your API implementation so that you can make a single HTTP request to the API (or multiple distinct requests if necessary), rather than relying on streaming the results?

I would like to deploy a large finetuned GPT-2 model. Since it is so large it takes a while to get the whole output and I would like to stream partial outputs instead of waiting for the whole thing. Something like AI Dungeon 2

da-source commented 4 years ago

@da-source we haven't scheduled this one yet; we usually plan about two weeks at a time.

Would it be possible to change your API implementation so that you can make a single HTTP request to the API (or multiple distinct requests if necessary), rather than relying on streaming the results?

I’m using compressed* large GPT-2 model: https://bellard.org/nncp/gpt2tc.html

AbbeKamalov commented 4 years ago

Motivation

Reduce latency when multiple requests are required

Stream output from the predictor as it's generated

Hi! Are there any updates on when this will be coming out?

RobertLucian commented 4 years ago

@mutal we haven't come up with a timeline for it yet. We'll keep this ticket updated as we go along. Is this urgent to you? And to re-iterate what @deliahu has mentioned before, we usually plan about two weeks at a time.

da-source commented 4 years ago

@mutal we haven't come up with a timeline for it yet. We'll keep this ticket updated as we go along. Is this urgent to you? And to re-iterate what @deliahu has mentioned before, we usually plan about two weeks at a time.

I was hoping to implement a project with scalable infrastructure and websockets this month, so it would be nice if you could add this feature as soon as possible.

AbbeKamalov commented 4 years ago

@mutal we haven't come up with a timeline for it yet. We'll keep this ticket updated as we go along. Is this urgent to you? And to re-iterate what @deliahu has mentioned before, we usually plan about two weeks at a time.

It would be very helpful for me if this feature would become available. When you say two weeks at a time, does that mean you plan to add it the week after the next one?

vishalbollu commented 4 years ago

@mutal @da-source It appears that you have some urgency with regards to this feature.

Unfortunately, this feature is not a priority for Cortex for the next few weeks.

If I were in your position and wanted to ship something in the next month or so, I would try the workaround suggested here to use Cortex for your project.

Feel free to watch for notifications on this ticket. When the team has decided to prioritize this ticket, it will be moved from the to prioritize column to current sprint. If it remains in the to prioritize column it means that the team has decided that other features are a higher priority than this feature.

AbbeKamalov commented 4 years ago

@mutal @da-source It appears that you have some urgency with regards to this feature.

Unfortunately, this feature is not a priority for Cortex for the next few weeks.

If I were in your position and wanted to ship something in the next month or so, I would try the workaround suggested here to use Cortex for your project.

Feel free to watch for notifications on this ticket. When the team has decided to prioritize this ticket, it will be moved from the to prioritize column to current sprint. If it remains in the to prioritize column it means that the team has decided that other features are a higher priority than this feature.

The workoaround that you have suggested doesn't work for me, because it means restarting the process (something I'm trying to avoid) on each call. In the meantime, I'll try to find a way to create a websockets on the Cortex instances myself. It shouldn't be too hard. Based on this, I'll have to replace localhost with the IP of the Cortex's AWS instance. Any ideas on how to get the IP of the instances which Cortex spins up?

imagine3D-ai commented 4 years ago

+1

cortexlabs / cortex