FinalsClub / karmaworld

KarmaNotes.org v3.0
GNU Affero General Public License v3.0
7 stars 6 forks source link

Improve GDrive processing time #212

Closed AndrewMagliozzi closed 10 years ago

AndrewMagliozzi commented 10 years ago

Why does it take so long to get HTML back from GoogleDrive?

Look into the frequency we are polling GDrive and if we can do that more quickly. Right now it takes 2-3 minutes and I'd like to get it under 1 minute.

sethwoodworth commented 10 years ago

Google is slow to process documents. But I suspect that polling more frequently (with sane logarithmically increasing wait times) could cut down the average time to process.

btbonval commented 10 years ago

Here's one problem I found.

When uploading a file, if there is a particular sort of problem, then wait 30 seconds (and just assume its fine after waiting the 30 seconds). https://github.com/FinalsClub/karmaworld/blob/master/karmaworld/apps/notes/gdrive.py#L154-L158

This could be improved with a while loop that starts at 0.5 seconds and doubles the wait time each loop. Sort of like Seth was saying for polling, although I think that is a different polling than this bit of code during upload.

btbonval commented 10 years ago

This bit of code follows the one-time wait of 30 seconds, and if things are still not copacetic, then error. https://github.com/FinalsClub/karmaworld/blob/master/karmaworld/apps/notes/gdrive.py#L198-L201

This should be moved into the upload function inside the aforementioned wait loop. If the wait count gets to something like 32 inside the loop (after having run e.g. 0.5, 1, 2, 4, 8, 16 for a total of 31.5 seconds), then give up and error.

charlesconnell commented 10 years ago

It's interesting that this is slow on the server, because it takes just a few seconds when running on my laptop.

btbonval commented 10 years ago

If you manually upload a doc and then download-as, it's pretty quick. There's little reason to think your system or the server should be any slower. We're not really sure where the slowness is being added. It might be in the server code, or it might be in the celery processes (more of an IT problem), or who knows at this point.

The time.sleep(30) definitely needs work, and I have modified it in the working copy of GDrive fixup branch: https://github.com/FinalsClub/karmaworld/blob/fix_gdrive_service/karmaworld/apps/notes/gdrive.py#L118-L134

btbonval commented 10 years ago

karmaworld/settings/dev.py doesn't mention Apache Message Queue at all. Does celery use it by default?

karmaworld/settings/prod.py does use AMQP and quite explicitly.

This might be a difference to investigate.

charlesconnell commented 10 years ago

beta is attempting to connect to various hosts at Google using IPv6. For each TCP connection in the authenticate/upload/download process, it tries IPv6 for 30 seconds before falling back to IPv4. Modifying /etc/hosts to direct certain hostnames straight to IPv4 address makes the process take just a couple of seconds. I don't know why IPv6 connections are failing. Bryan, I noticed that /etc/dhcp/dhclient.conf is modified from the default configuration, could that have to do with it?

btbonval commented 10 years ago

I'm not too familiar with IPv6 issues. I'd figure by this point in time there wouldn't be issues, but then again, by this time the whole world was supposed to be on IPv6.

How could you tell dhclient.conf has been modified from default? I haven't looked at that file. Beta is an EC2 instance, so its internal networking stuff does not reflect its external facing internet stuff. We might have to go mangle things in the AWS Console to get IPv6 working.

Related links: http://tech.3scale.net/2012/06/29/enabling-ipv6-on-amazon-ec2/ http://blog.iphoting.com/blog/2012/06/02/ipv6-on-amazon-aws-ec2/ http://binarymentalist.com/post/2984855918/try-ipv6-on-amazon-ec2-using-6to4

On Sat, Dec 21, 2013 at 10:16 PM, Charles Connell notifications@github.comwrote:

beta is attempting to connect to various hosts at Google using IPv6. For each TCP connection in the authenticate/upload/download process, it tries IPv6 for 30 seconds before falling back to IPv4. Modifying /etc/hosts to direct certain hostnames straight to IPv4 address makes the process take just a couple of seconds. I don't know why IPv6 connections are failing. Bryan, I noticed that /etc/dhcp/dhclient.conf is modified from the default configuration, could that have to do with it?

— Reply to this email directly or view it on GitHubhttps://github.com/FinalsClub/karmaworld/issues/212#issuecomment-31077878 .

btbonval commented 10 years ago

oh wait. Actually, I think we decided prod was the EC2 instance and beta was the Linode.

It looks like we'll need to check the Linode Manager for the correct IPv6 and make sure the OS is setup to match. https://www.linode.com/IPv6/

On Sun, Dec 22, 2013 at 12:58 AM, Bryan btbonval@gmail.com wrote:

I'm not too familiar with IPv6 issues. I'd figure by this point in time there wouldn't be issues, but then again, by this time the whole world was supposed to be on IPv6.

How could you tell dhclient.conf has been modified from default? I haven't looked at that file. Beta is an EC2 instance, so its internal networking stuff does not reflect its external facing internet stuff. We might have to go mangle things in the AWS Console to get IPv6 working.

Related links: http://tech.3scale.net/2012/06/29/enabling-ipv6-on-amazon-ec2/ http://blog.iphoting.com/blog/2012/06/02/ipv6-on-amazon-aws-ec2/

http://binarymentalist.com/post/2984855918/try-ipv6-on-amazon-ec2-using-6to4

On Sat, Dec 21, 2013 at 10:16 PM, Charles Connell < notifications@github.com> wrote:

beta is attempting to connect to various hosts at Google using IPv6. For each TCP connection in the authenticate/upload/download process, it tries IPv6 for 30 seconds before falling back to IPv4. Modifying /etc/hosts to direct certain hostnames straight to IPv4 address makes the process take just a couple of seconds. I don't know why IPv6 connections are failing. Bryan, I noticed that /etc/dhcp/dhclient.conf is modified from the default configuration, could that have to do with it?

— Reply to this email directly or view it on GitHubhttps://github.com/FinalsClub/karmaworld/issues/212#issuecomment-31077878 .

AndrewMagliozzi commented 10 years ago

Hey Bryan, did you see bob's email to you re: prod

On Dec 22, 2013, at 2:30 AM, Bryan Bonvallet notifications@github.com wrote:

oh wait. Actually, I think we decided prod was the EC2 instance and beta was the Linode.

It looks like we'll need to check the Linode Manager for the correct IPv6 and make sure the OS is setup to match. https://www.linode.com/IPv6/

On Sun, Dec 22, 2013 at 12:58 AM, Bryan btbonval@gmail.com wrote:

I'm not too familiar with IPv6 issues. I'd figure by this point in time there wouldn't be issues, but then again, by this time the whole world was supposed to be on IPv6.

How could you tell dhclient.conf has been modified from default? I haven't looked at that file. Beta is an EC2 instance, so its internal networking stuff does not reflect its external facing internet stuff. We might have to go mangle things in the AWS Console to get IPv6 working.

Related links: http://tech.3scale.net/2012/06/29/enabling-ipv6-on-amazon-ec2/ http://blog.iphoting.com/blog/2012/06/02/ipv6-on-amazon-aws-ec2/

http://binarymentalist.com/post/2984855918/try-ipv6-on-amazon-ec2-using-6to4

On Sat, Dec 21, 2013 at 10:16 PM, Charles Connell < notifications@github.com> wrote:

beta is attempting to connect to various hosts at Google using IPv6. For each TCP connection in the authenticate/upload/download process, it tries IPv6 for 30 seconds before falling back to IPv4. Modifying /etc/hosts to direct certain hostnames straight to IPv4 address makes the process take just a couple of seconds. I don't know why IPv6 connections are failing. Bryan, I noticed that /etc/dhcp/dhclient.conf is modified from the default configuration, could that have to do with it?

— Reply to this email directly or view it on GitHubhttps://github.com/FinalsClub/karmaworld/issues/212#issuecomment-31077878 .

— Reply to this email directly or view it on GitHub.

btbonval commented 10 years ago

Yeah. It works now.

I have to dig around that system to see how consistent it is with beta and what sort of workspace changes are in place that diverge from the repo. -Bryan

On Sun, Dec 22, 2013 at 8:57 AM, Andrew Magliozzi notifications@github.comwrote:

Hey Bryan, did you see bob's email to you re: prod

On Dec 22, 2013, at 2:30 AM, Bryan Bonvallet notifications@github.com wrote:

oh wait. Actually, I think we decided prod was the EC2 instance and beta was the Linode.

It looks like we'll need to check the Linode Manager for the correct IPv6 and make sure the OS is setup to match. https://www.linode.com/IPv6/

On Sun, Dec 22, 2013 at 12:58 AM, Bryan btbonval@gmail.com wrote:

I'm not too familiar with IPv6 issues. I'd figure by this point in time there wouldn't be issues, but then again, by this time the whole world was supposed to be on IPv6.

How could you tell dhclient.conf has been modified from default? I haven't looked at that file. Beta is an EC2 instance, so its internal networking stuff does not reflect its external facing internet stuff. We might have to go mangle things in the AWS Console to get IPv6 working.

Related links: http://tech.3scale.net/2012/06/29/enabling-ipv6-on-amazon-ec2/ http://blog.iphoting.com/blog/2012/06/02/ipv6-on-amazon-aws-ec2/

http://binarymentalist.com/post/2984855918/try-ipv6-on-amazon-ec2-using-6to4

On Sat, Dec 21, 2013 at 10:16 PM, Charles Connell < notifications@github.com> wrote:

beta is attempting to connect to various hosts at Google using IPv6. For each TCP connection in the authenticate/upload/download process, it tries IPv6 for 30 seconds before falling back to IPv4. Modifying /etc/hosts to direct certain hostnames straight to IPv4 address makes the process take just a couple of seconds. I don't know why IPv6 connections are failing. Bryan, I noticed that /etc/dhcp/dhclient.conf is modified from the default configuration, could that have to do with it?

— Reply to this email directly or view it on GitHub< https://github.com/FinalsClub/karmaworld/issues/212#issuecomment-31077878>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/FinalsClub/karmaworld/issues/212#issuecomment-31087875 .

AndrewMagliozzi commented 10 years ago

this works now. Thanks @charlesconnell