Open omenking opened 4 years ago
SELECT * FROM tweets FILTER last week
& sum based on username & order by sum of tweets - > Weight this out with the GitHub commits and come to a final leaderboard order. This is a very expensive and computationally heavy query and doesn't necessarly offer any benefit over updating it every X hours.Need to commit all this for now. Will add further info in a bit.
Okay, I've been fighting so much with understanding Amplify and I propose we go Serverless instead. To give a little context, I've put a lot of work into understanding Amplify. I can totally see it being a great tool if you know the small (and very unintuitive) quirks of it, but since nobody on our team is at that place (I honestly don't even want to be), it would add a lot of time and unnecessary headache IMO. Most of us are pretty familiar with serverless however, and since this is a Gatsby project, it works perfectly as well.
Architecture idea:
That's a bit of braindump for now, I'm going to start working on the Serverless app. In the meantime, I'd suggest checking out the following video: OAUTH and OIDC in plain english. - This will make you go from "what is happening" to OAUTH pro in an hour.
I didn't mention any of the frontend requirements to work with this solution but I could look into that as well. Technically it's only a couple; the input form (for now), the leaderboard window and the OAUTH authentication (which is cleared up by that video just above).
Let me go through all of these in order.
Frontend: Gatsby static site hosted in S3+CloudFront. Pretty straightforward
Leaderboard updates are easy to do since Gatsby uses seperate "containers" for components of the website. The leaderboard component can be updated with a very simple Lamba function that pulls the data from the database and updates that file statically, or it can simply pull the finished numbers from dynamodb as well through Lambda and an API Gateway endpoint. It would be REST, yes, but it works and it's fast to build [for now]. Another Lambda can look through the database periodically to create the up-to-date and ready-to-go leaderboard stats. Leaderboards are a good fit for services like Redis, but a table that is updated on a schedule would work as well. For the frontend, we can just call an API that returns the most up to date leaderboard.
We have easy and full control over the DynamoDB database. (I honestly have zero idea how to create a DynamoDB database with a proper setup and "connectors" in Amplify, after a week of messing around). For data transform/update inside the database we can use scheduled Lambda functions as mentioned above. See point above.
We don't need Cognito. All user data can be stored perfectly in DynamoDB and authentication will happen through GitHub OAUTH anyways. In terms of fast turnaround, Cognito is still a feasible option for the MVP. We can figure out OAuth at a later stage, but to get the product out of the door we can go with email/password combination
Input form for manual entries is a very simple API Gateway/Lambda integration. We can go with this setup in the beginning and modify it later to GraphQL if deemed necessary (as I have no knowledge on how GraphQL works either yet, but REST yes). I will likely use React-Forms to handle the frontend, and we can just have a POST endpoint that triggers a Lambda to put the correct details where they need to be.
The OAUTH login workflow with Serverless has been done quite a lot. There's this post for example (bit off) which was the very first search result - there's info on it, unlike Amplify which we're still waiting on that PR for. See points above.
Teamwork and dev-staging environments are easy to do with SAM [and CloudFormation]. I could not find a single, even remotely viable solution to how to do this with Amplify. CI/CD can be done via GitHub actions or CodePipeline, including unit tests. It is definitely much easier to do this via CloudFormation/SAM, indeed.
We have so much more wiggle room on scraping or automatically keeping track of participants' activity with Lambdas. I know this could have been simply added next to the Amplify deployment but keeping everything together in a single stack just feels better and easier. Also this would consolidate all our deployment and testing workflows. Nothing to add here.
I would definitely need help with the frontend coding eventually just to speed up everything. I am starting to work on it right now again after a few days of MIA but more hands and brains can be helpful. Given that we are going fully serverless, can we have a chat on how to structure our endpoints and similar things? Also, were are we going to store things like avatars and articles?
Also, were are we going to store things like avatars and articles? As far as I know, articles on Gatsby should be easy to do from our end. When it comes to articles written by others, we could store it's title and necessary info in DynamoDB and store their thumbnails in S3. If we want to make them appear on our site insted of simply being redirected to their original location (which is not a good idea from an SEO standpoint), can store that in DynamoDB as well actually or store a markdown version of that in S3 and just pull it from there (weird I know lol) - but this is all for the future, just wanted to mention that it's feasable.
In terms of fast turnaround, Cognito is still a feasible option for the MVP. We can figure out OAuth at a later stage, but to get the product out of the door we can go with email/password combination That depends what we want to do first. If just a simple input form like @omenking suggested, no need for either at first. After seeing how that works and will be displayed, we can go for more authentication - in my opinion. I think a simple input form would be a great starting point. I can't really see the authentication part from a user perspective without knowing how the data generated by it will be used, but that's just me.
import datetime
class Score:
def __init__(self):
self.twitter = 0 # Current score accrued through Twitter
self.github = 0 # Current score accrued through GitHub
self.old_twitter = datetime.datetime(2020, 1, 1) # Second to last Twitter activity
self.new_twitter = datetime.datetime(2020, 1, 1) # Last Twitter activity
self.old_github = datetime.datetime(2020, 1, 1) # Second to last GitHub activity
self.new_github = datetime.datetime(2020, 1, 1) # Last GitHub activity
self.twitter_streak = 0
self.github_streak = 0
def github_activity(self):
# Rotate activity dates
self.old_github = self.new_github
self.new_github = datetime.datetime.now()
# If less than 3 days passed from the previous activity, add 1 to the streak, else reset the streak to 1
if (self.new_github - self.old_github).days <= 3:
self.github_streak += 1
else:
self.github_streak = 1
# If this is the first activity of the day, add 2 points multiplied by the streak length
if (self.new_github - self.old_github).days != 0:
self.github += 2 * self.github_streak
print("GitHub points for today:", 2 * self.github_streak)
print("Total GitHub points:", self.github)
print("GitHub streak:", self.github_streak)
def twitter_activity(self):
# Rotate activity dates
self.old_twitter = self.new_twitter
self.new_twitter = datetime.datetime.now()
# If less than 3 days passed from the previous activity, add 1 to the streak, else reset the streak to 1
if (self.new_twitter - self.old_twitter).days <= 3:
self.twitter_streak += 1
else:
self.twitter_streak = 1
# If this is the first activity of the day, add 2 points multiplied by the streak length
if (self.new_twitter - self.old_twitter).days != 0:
self.twitter += 1 * self.twitter_streak
print("Twitter points for today:", 1 * self.twitter_streak)
print("Total Twitter points:", self.twitter)
print("Twitter streak:", self.twitter_streak)
def get_score(self):
# Get full score
return self.twitter + self.github
Draft for the leaderboard logic.
TL;DR we need more diversity in knowledge, experience and ideas on what the leaderboard should be based on. @omenking @cmgorton @rishabkumar7 @madebygps
This feature is a bit down the road, but it needs to be somewhat planned for in advance. - For knowing the flow of the system and what to work on.
I'm personally recommending the first one, where only the total number of days are kept track of. My opinion is based on a case study from Jocko Willink's book Extreme Ownership, where a company was underperforming because their bonus system was too confusing. Employees didn't understand it, got totally random bonuses with each paycheck and therefore didn't perform well. After the company revised their bonus system to be dead-simple, they all took off because they understood the system behind it and they could work with that well.
Misc
Possible problems for the future with the leaderboard:
This is our Cloud Journey Template. It needs to be forked by each participant. https://github.com/100DaysOfCloud/100DaysOfCloud
We need a way to programmatically track people's progress when they update their fork with progress.
Can we use Github Actions? I have seen Github Labs have a Github Bot, investigate and get some ideas https://lab.github.com/
Would the repo trigger an endpoint with a payload? Webhook in our system.
Would we have to check periodically eg. CloudWatch Event > Lambda
Can we store it in DynamoDB, what should the DynamoDB database structure look like?
We don't have to worry about displaying it in our app just storing in a DB that will be accessed by AWS Amplify