RomainVialard / Google-Plus-Community-Migrator

https://docs.google.com/document/d/1UGhxaN5AiRXXL0Ki0DlVWLYJo_YYiEYhM2w1caRhljU/edit
12 stars 5 forks source link

Use pagination for all endpoints #6

Open brainysmurf opened 5 years ago

brainysmurf commented 5 years ago

Currently paging is not used for interactions at the following endpoints:

Since documentation indicates that, for large results, pagination is used, the consequence is that it does not actually download all relevant comments, plusoners and resharers.

brainysmurf commented 5 years ago

I have some code that does use pagination, and on the upside it also processes both endpoints (Plus.People.listByActivity twice between plusoners/resharers) concurrently, which I believe is much more efficient. The downside is that it uses a bunch of libraries I've written. Trying to determine how much more efficient by timing, also trying to include count of comments, plusoners, resharers.

RomainVialard commented 5 years ago

Indeed updating the current code to get all comments, plusoners and resharers would be better.

As for the count of comments, plusoners and resharers, it is included in each post data: https://developers.google.com/+/web/api/rest/latest/activities#resource-representations object.replies.totalItems, object.plusoners.totalItems, object.resharers.totalItems

Also, not directly related but the G+ API might stop working correctly as soon as January 28, 2019, so better to retrieve the content before this date :) https://developers.google.com/+/api-shutdown

brainysmurf commented 5 years ago

So far, so good. Turns out that concurrently processing the comments, plusoners, and resharers (with pagination enabled, to get all of them) is much more efficient, as for some posts it takes up to 2.5 seconds for one of those to complete. We can safely increase the execution time to 4.5 minutes as well.

I'll add a pull request after this test run.

RomainVialard commented 5 years ago

Nice! Note that in the meantime I also updated the Apps Script code to make less calls to Firebase (it was quickly eating the UrlFetch quota of some users). Now only one call to Firebase per post (using updateData()): https://github.com/RomainVialard/Google-Plus-Community-Migrator/blob/3b7f0452c2f592fba9cb63605942e1bc8f17878a/Apps%20Script/Code.js#L108

brainysmurf commented 5 years ago

Since concurrent processing also uses UrlFetch.fetchAll, was hitting that wall as well; in fact the test run failed because of it. Have incorporated that improvement and re-running it to see if it can be complete it without hitting the daily limit.

RomainVialard commented 5 years ago

The UI is now ready to display all comments: https://github.com/RomainVialard/Google-Plus-Community-Migrator/commit/deedcb49342f3b959b0695ff2267a96aa1a36894

Here's a live example, if you click on the comment preview of a post it will display all comments (currently the max 10 we are storing in Firebase) linked to the post: https://apps-script-community-archive.firebaseapp.com/

brainysmurf commented 5 years ago

Using pagination for all endpoints is also in the concurrent pull request. Is there something in the UI to work on?

RomainVialard commented 5 years ago

Yes, many things, though not listed as feature requests among issues. If you want to start with something simple: the nb of reshares linked to a post isn't currently displayed (you could reuse the same code used to display the nb of +1) The nb of comments is also not displayed (in G+, the text is "SHOW ALL XX COMMENTS")

Also: Posts with text and images are correctly displayed, but if a post contains a link to an external resource (an article on the web, link to a documentation, a YouTube video, a Drive document,...) all informations are not displayed.

And: If you are motivated, the interesting thing would be to transform those exports into real, live communities. In that case, once the user is authenticated, he should be able, for example, to +1 an existing post. In that case we would record the +1 in the Firebase database (and we should update the Firebase rules to make sure only the authenticated user can +1 a post in his own name).

brainysmurf commented 5 years ago

I dreamed up a different way to do download content more efficiently without additional setup, but the ui is higher priority?

RomainVialard commented 5 years ago

No, exporting everything should be the priority. The API will soon die and after that it won't be possible to export the data. Whereas the UI part can be updated at any time, even after the shutdown of the API.

It would also be great to find ideas to help people get the last version of everything without having to do much work themselves (eg: pushing updates on GitHub would trigger the update of all Firebase web apps,...)

brainysmurf commented 5 years ago

Okay. I’m working on it. The idea is to collect ids and then use fetchAll in batches. No child scripts needed and no additional setup either.

RomainVialard commented 5 years ago

I've pushed an update to +1 posts. This means it's not just an archive anymore but a live copy (+1 or removing a +1 will update the UI and the data in the Firebase Realtime Database). https://github.com/RomainVialard/Google-Plus-Community-Migrator/commit/5a0ddde77abf24170b4033b79b8c0bb2755ba233

brainysmurf commented 5 years ago

Nice. I’m traveling but have some time now to work on it.

brainysmurf commented 5 years ago

Made good progress will update soon (traveling)

brainysmurf commented 5 years ago

Okay, so I have now gotten it working so that all endpoints are using pagination. The method gets 100 activities, and then derives all of the requests needed to get comments, plusoners, and resharers from those 100 activities, fetching them as a batch using UrlFetchApp.fetchAll, and updating the database as a batch as well. The amazing thing is how much faster this is, nearly 3000 posts with accompanying content in just one four-minute execution! In the next execution, I hit the daily quota limit.

RomainVialard commented 5 years ago

Nice!